1
|
Lemas DJ, Du X, Rouhizadeh M, Lewis B, Frank S, Wright L, Spirache A, Gonzalez L, Cheves R, Magalhães M, Zapata R, Reddy R, Xu K, Parker L, Harle C, Young B, Louis-Jaques A, Zhang B, Thompson L, Hogan WR, Modave F. Classifying early infant feeding status from clinical notes using natural language processing and machine learning. Sci Rep 2024; 14:7831. [PMID: 38570569 PMCID: PMC10991582 DOI: 10.1038/s41598-024-58299-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 03/27/2024] [Indexed: 04/05/2024] Open
Abstract
The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother's milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.
Collapse
Affiliation(s)
- Dominick J Lemas
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA.
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Gainesville, FL, 32610, USA.
| | - Xinsong Du
- Division of General Internal Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, 02115, USA
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
| | - Masoud Rouhizadeh
- Department of Pharmaceutical Outcomes and Policy, University of Florida College of Medicine, Gainesville, FL, 32610, USA
- Biomedical Informatics and Data Science Section, Division of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Braeden Lewis
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Simon Frank
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Lauren Wright
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Alex Spirache
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Lisa Gonzalez
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Ryan Cheves
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Marina Magalhães
- Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, 94305, USA
| | - Ruben Zapata
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Rahul Reddy
- Department of Computer and Information Science, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Ke Xu
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Leslie Parker
- Department of Biobehavioral Nursing Science, University of Florida College of Nursing, Gainesville, FL, 32603, USA
| | - Chris Harle
- Health Policy and Management Department, Richard M. Fairbanks School of Public Health, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202, USA
| | - Bridget Young
- Division of Breastfeeding and Lactation Medicine, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Adetola Louis-Jaques
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Gainesville, FL, 32610, USA
| | - Bouri Zhang
- Health Science Center Libraries, University of Florida, Gainesville, FL, 32610, USA
| | - Lindsay Thompson
- Department of Pediatrics, Wake Forest School of Medicine, Winston-Salem, NC, 27101, USA
| | - William R Hogan
- Data Science Institute, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - François Modave
- Department of Anesthesiology, University of Florida College of Medicine, Gainesville, FL, 32610, USA
| |
Collapse
|
2
|
Gallardo-Pizarro A, Peyrony O, Chumbita M, Monzo-Gallo P, Aiello TF, Teijon-Lumbreras C, Gras E, Mensa J, Soriano A, Garcia-Vidal C. Improving management of febrile neutropenia in oncology patients: the role of artificial intelligence and machine learning. Expert Rev Anti Infect Ther 2024; 22:179-187. [PMID: 38457198 DOI: 10.1080/14787210.2024.2322445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 02/20/2024] [Indexed: 03/09/2024]
Abstract
INTRODUCTION Artificial intelligence (AI) and machine learning (ML) have the potential to revolutionize the management of febrile neutropenia (FN) and drive progress toward personalized medicine. AREAS COVERED In this review, we detail how the collection of a large number of high-quality data can be used to conduct precise mathematical studies with ML and AI. We explain the foundations of these techniques, covering the fundamentals of supervised and unsupervised learning, as well as the most important challenges, e.g. data quality, 'black box' model interpretation and overfitting. To conclude, we provide detailed examples of how AI and ML have been used to enhance predictions of chemotherapy-induced FN, detection of bloodstream infections (BSIs) and multidrug-resistant (MDR) bacteria, and anticipation of severe complications and mortality. EXPERT OPINION There is promising potential of implementing accurate AI and ML models whilst managing FN. However, their integration as viable clinical tools poses challenges, including technical and implementation barriers. Improving global accessibility, fostering interdisciplinary collaboration, and addressing ethical and security considerations are essential. By overcoming these challenges, we could transform personalized care for patients with FN.
Collapse
Affiliation(s)
| | - Olivier Peyrony
- Hospital Clinic of Barcelona-IDIBAPS, University of Barcelona, Barcelona, Spain
| | - Mariana Chumbita
- Hospital Clinic of Barcelona-IDIBAPS, University of Barcelona, Barcelona, Spain
| | | | | | | | - Emmanuelle Gras
- Hospital Clinic of Barcelona-IDIBAPS, University of Barcelona, Barcelona, Spain
| | - Josep Mensa
- Hospital Clinic of Barcelona-IDIBAPS, University of Barcelona, Barcelona, Spain
| | - Alex Soriano
- Hospital Clinic of Barcelona-IDIBAPS, University of Barcelona, Barcelona, Spain
| | | |
Collapse
|
3
|
Xie F, Beukelman T, Sun D, Yun H, Curtis JR. Identifying inpatient mortality in MarketScan claims data using machine learning. Pharmacoepidemiol Drug Saf 2023; 32:1299-1305. [PMID: 37344984 DOI: 10.1002/pds.5658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 05/24/2023] [Accepted: 06/19/2023] [Indexed: 06/23/2023]
Abstract
PURPOSE Inpatient mortality is an important variable in epidemiology studies using claims data. In 2016, MarketScan data began obscuring specific hospital discharge status types for patient privacy, including inpatient deaths, by setting the values to missing. We used a machine learning approach to correctly identify hospitalizations that resulted in inpatient death using data prior to 2016. METHODS All hospitalizations from 2011 to 2015 with discharge status of missing, died, or one of the other subsequently obscured values were identified and divided into a training set and two test sets. Predictor variables included age, sex, elapsed time from hospital discharge until last observed claim and until healthcare plan disenrollment, and absence of any discharge diagnoses. Four machine learning methods were used to train statistical models and assess sensitivity and positive predictive value (PPV) for inpatient mortality. RESULTS Overall 1 307 917 hospitalizations were included. All four machine learning approaches performed well in all datasets. Random forest performed best with 88% PPV and 93% sensitivity for the training set and both test sets. The two factors with the highest relative importance for identifying inpatient mortality were having no observed claims for the patient on days 2-91 following hospital discharge and patient disenrollment from the healthcare plan within 60 days following hospital discharge. CONCLUSION We successfully developed machine learning algorithms to identify inpatient mortality. This approach can be applied to obscured data to accurately identify inpatient mortality among hospitalizations with missing discharge status.
Collapse
Affiliation(s)
- Fenglong Xie
- Department of Medicine, Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, Alabama, USA
- Foundation for Science, Technology, Education, and Research (FASTER), Birmingham, Alabama, USA
| | - Timothy Beukelman
- Foundation for Science, Technology, Education, and Research (FASTER), Birmingham, Alabama, USA
| | - Dongmei Sun
- Department of Medicine, Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Huifeng Yun
- Department of Medicine, Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Jeffrey R Curtis
- Department of Medicine, Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, Alabama, USA
- Foundation for Science, Technology, Education, and Research (FASTER), Birmingham, Alabama, USA
| |
Collapse
|
4
|
Le JP, Shashikumar SP, Malhotra A, Nemati S, Wardi G. Making the Improbable Possible: Generalizing Models Designed for a Syndrome-Based, Heterogeneous Patient Landscape. Crit Care Clin 2023; 39:751-768. [PMID: 37704338 PMCID: PMC10758922 DOI: 10.1016/j.ccc.2023.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
Syndromic conditions, such as sepsis, are commonly encountered in the intensive care unit. Although these conditions are easy for clinicians to grasp, these conditions may limit the performance of machine-learning algorithms. Individual hospital practice patterns may limit external generalizability. Data missingness is another barrier to optimal algorithm performance and various strategies exist to mitigate this. Recent advances in data science, such as transfer learning, conformal prediction, and continual learning, may improve generalizability of machine-learning algorithms in critically ill patients. Randomized trials with these approaches are indicated to demonstrate improvements in patient-centered outcomes at this point.
Collapse
Affiliation(s)
- Joshua Pei Le
- School of Medicine, University of Limerick, Castletroy, Co, Limerick V94 T9PX, Ireland
| | | | - Atul Malhotra
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, San Diego, CA, USA
| | - Shamim Nemati
- Division of Biomedical Informatics, University of California San Diego, San Diego, CA, USA
| | - Gabriel Wardi
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, San Diego, CA, USA; Department of Emergency Medicine, University of California San Diego, 200 W Arbor Drive, San Diego, CA 92103, USA.
| |
Collapse
|
5
|
Ma J, Dhiman P, Qi C, Bullock G, van Smeden M, Riley RD, Collins GS. Poor handling of continuous predictors in clinical prediction models using logistic regression: a systematic review. J Clin Epidemiol 2023; 161:140-151. [PMID: 37536504 DOI: 10.1016/j.jclinepi.2023.07.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 07/20/2023] [Accepted: 07/26/2023] [Indexed: 08/05/2023]
Abstract
BACKGROUND AND OBJECTIVES When developing a clinical prediction model, assuming a linear relationship between the continuous predictors and outcome is not recommended. Incorrect specification of the functional form of continuous predictors could reduce predictive accuracy. We examine how continuous predictors are handled in studies developing a clinical prediction model. METHODS We searched PubMed for clinical prediction model studies developing a logistic regression model for a binary outcome, published between July 01, 2020, and July 30, 2020. RESULTS In total, 118 studies were included in the review (18 studies (15%) assessed the linearity assumption or used methods to handle nonlinearity, and 100 studies (85%) did not). Transformation and splines were commonly used to handle nonlinearity, used in 7 (n = 7/18, 39%) and 6 (n = 6/18, 33%) studies, respectively. Categorization was most often used method to handle continuous predictors (n = 67/118, 56.8%) where most studies used dichotomization (n = 40/67, 60%). Only ten models included nonlinear terms in the final model (n = 10/18, 56%). CONCLUSION Though widely recommended not to categorize continuous predictors or assume a linear relationship between outcome and continuous predictors, most studies categorize continuous predictors, few studies assess the linearity assumption, and even fewer use methodology to account for nonlinearity. Methodological guidance is provided to guide researchers on how to handle continuous predictors when developing a clinical prediction model.
Collapse
Affiliation(s)
- Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, United Kingdom.
| | - Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, United Kingdom
| | - Cathy Qi
- Population Data Science, Swansea University Medical School, Faculty of Medicine, Health and Life Science, Swansea University, Singleton Park Swansea, SA2 8PP, Swansea, United Kingdom
| | - Garrett Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, NC, USA; Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, United Kingdom
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, United Kingdom
| |
Collapse
|
6
|
Padmanabhan R, Elomri A, Taha RY, El Omri H, Elsabah H, El Omri A. Prediction of Multiple Clinical Complications in Cancer Patients to Ensure Hospital Preparedness and Improved Cancer Care. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 20:526. [PMID: 36612856 PMCID: PMC9819091 DOI: 10.3390/ijerph20010526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 11/22/2022] [Accepted: 12/04/2022] [Indexed: 06/17/2023]
Abstract
Reliable and rapid medical diagnosis is the cornerstone for improving the survival rate and quality of life of cancer patients. The problem of clinical decision-making pertaining to the management of patients with hematologic cancer is multifaceted and intricate due to the risk of therapy-induced myelosuppression, multiple infections, and febrile neutropenia (FN). Myelosuppression due to treatment increases the risk of sepsis and mortality in hematological cancer patients with febrile neutropenia. A high prevalence of multidrug-resistant organisms is also noted in such patients, which implies that these patients are left with limited or no-treatment options amidst severe health complications. Hence, early screening of patients for such organisms in their bodies is vital to enable hospital preparedness, curtail the spread to other weak patients in hospitals, and limit community outbreaks. Even though predictive models for sepsis and mortality exist, no model has been suggested for the prediction of multidrug-resistant organisms in hematological cancer patients with febrile neutropenia. Hence, for predicting three critical clinical complications, such as sepsis, the presence of multidrug-resistant organisms, and mortality, from the data available from medical records, we used 1166 febrile neutropenia episodes reported in 513 patients. The XGboost algorithm is suggested from 10-fold cross-validation on 6 candidate models. Other highlights are (1) a novel set of easily available features for the prediction of the aforementioned clinical complications and (2) the use of data augmentation methods and model-scoring-based hyperparameter tuning to address the problem of class disproportionality, a common challenge in medical datasets and often the reason behind poor event prediction rate of various predictive models reported so far. The proposed model depicts improved recall and AUC (area under the curve) for sepsis (recall = 98%, AUC = 0.85), multidrug-resistant organism (recall = 96%, AUC = 0.91), and mortality (recall = 86%, AUC = 0.88) prediction. Our results encourage the need to popularize artificial intelligence-based devices to support clinical decision-making.
Collapse
Affiliation(s)
- Regina Padmanabhan
- Division of Engineering Management and Decision Sciences, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha 34110, Qatar
| | - Adel Elomri
- Division of Engineering Management and Decision Sciences, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha 34110, Qatar
| | - Ruba Yasin Taha
- Department of Hematology and Bone Marrow Transplant, National Center for Cancer Care and Research, Hamad Medical Corporation, Doha 3050, Qatar
| | - Halima El Omri
- Department of Hematology and Bone Marrow Transplant, National Center for Cancer Care and Research, Hamad Medical Corporation, Doha 3050, Qatar
| | - Hesham Elsabah
- Department of Hematology and Bone Marrow Transplant, National Center for Cancer Care and Research, Hamad Medical Corporation, Doha 3050, Qatar
| | - Abdelfatteh El Omri
- Surgical Research Section, Department of Surgery, Hamad Medical Corporation, Doha 3050, Qatar
| |
Collapse
|
7
|
The Prognostic Utility of Lymphocyte-Based Measures and Ratios in Chemotherapy-Induced Febrile Neutropenia Patients following Granulocyte Colony-Stimulating Factor Therapy. Medicina (B Aires) 2022; 58:medicina58111508. [DOI: 10.3390/medicina58111508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 10/11/2022] [Accepted: 10/13/2022] [Indexed: 11/05/2022] Open
Abstract
Background and Objectives: Chemotherapy-induced febrile neutropenia is the most widespread oncologic emergency with high morbidity and mortality rates. Herein we present a retrospective risk factor identification study to evaluate the prognostic role of lymphocyte-based measures and ratios in a cohort of chemotherapy-induced febrile neutropenia patients following granulocyte colony-stimulating factor (G-CSF) therapy. Materials and Methods: The electronic medical records at our center were utilized to identify patients with a first attack of chemotherapy-induced febrile neutropenia and were treated accordingly with G-CSF between January 2010 to December 2020. Patients’ demographics and disease characteristics along with laboratory tests data were extracted. Prognosis-related indicators were the absolute neutrophil count (ANC) at admission and the following 6 days besides the length of stay and mortality rate. Results: A total of 80 patients were enrolled, which were divided according to the absolute lymphocyte count at admission into two groups, the first includes lymphopenia patients (n = 55) and the other is the non-lymphopenia group (n = 25) with a cutoff point of 700 lymphocytes/μL. Demographics and baseline characteristics were generally insignificant among the two groups but the white blood cell count was higher in the non-lymphopenia group. ANC, neutrophils percentage and ANC difference in reference to admission among the two study groups were totally insignificant. The same insignificant pattern was observed in the length of stay and the mortality rate. Univariate analysis utilizing the ANC difference compared to the admission day as the dependent variable, revealed no predictability role in the first three days of follow up for any of the variables included. However, during the fourth day of follow up, both WBC (OR = 0.261; 95% CI: 0.075, 0.908; p = 0.035) and lymphocyte percentage (OR = 1.074; 95% CI: 1.012, 1.141; p = 0.019) were marginally significant, in which increasing WBC was associated with a reduction in the likelihood of ANC count increase, compared to the lymphocyte percentage which exhibited an increase in the likelihood. In comparison, sequential ANC difference models demonstrated lymphocyte percentage (OR = 0.961; 95% CI: 0.932, 0.991; p = 0.011) and monocyte-to-lymphocyte ratio (OR = 7.436; 95% CI: 1.024, 54.020; p = 0.047) reduction and increment in the enhancement of ANC levels, respectively. The fifth day had WBC (OR = 0.790; 95% CI: 0.675, 0.925; p = 0.003) to be significantly decreasing the likelihood of ANC increment. Conclusions: we were unable to determine any concrete prognostic role of lymphocyte-related measures and ratios. It is plausible that several limitations could have influenced the results obtained, but as far as our analysis is concerned ALC role as a predictive factor for ANC changes remains questionable.
Collapse
|
8
|
Tu KC, Eric Nyam TT, Wang CC, Chen NC, Chen KT, Chen CJ, Liu CF, Kuo JR. A Computer-Assisted System for Early Mortality Risk Prediction in Patients with Traumatic Brain Injury Using Artificial Intelligence Algorithms in Emergency Room Triage. Brain Sci 2022; 12:brainsci12050612. [PMID: 35624999 PMCID: PMC9138998 DOI: 10.3390/brainsci12050612] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 05/05/2022] [Indexed: 01/27/2023] Open
Abstract
Traumatic brain injury (TBI) remains a critical public health challenge. Although studies have found several prognostic factors for TBI, a useful early predictive tool for mortality has yet to be developed in the triage of the emergency room. This study aimed to use machine learning algorithms of artificial intelligence (AI) to develop predictive models for TBI patients in the emergency room triage. We retrospectively enrolled 18,249 adult TBI patients in the electronic medical records of three hospitals of Chi Mei Medical Group from January 2010 to December 2019, and undertook the 12 potentially predictive feature variables for predicting mortality during hospitalization. Six machine learning algorithms including logistical regression (LR) random forest (RF), support vector machines (SVM), LightGBM, XGBoost, and multilayer perceptron (MLP) were used to build the predictive model. The results showed that all six predictive models had high AUC from 0.851 to 0.925. Among these models, the LR-based model was the best model for mortality risk prediction with the highest AUC of 0.925; thus, we integrated the best model into the existed hospital information system for assisting clinical decision-making. These results revealed that the LR-based model was the best model to predict the mortality risk in patients with TBI in the emergency room. Since the developed prediction system can easily obtain the 12 feature variables during the initial triage, it can provide quick and early mortality prediction to clinicians for guiding deciding further treatment as well as helping explain the patient’s condition to family members.
Collapse
Affiliation(s)
- Kuan-Chi Tu
- Department of Neurosurgery, Chi Mei Medical Center, Tainan 710402, Taiwan; (K.-C.T.); (T.-T.E.N.); (C.-C.W.)
| | - Tee-Tau Eric Nyam
- Department of Neurosurgery, Chi Mei Medical Center, Tainan 710402, Taiwan; (K.-C.T.); (T.-T.E.N.); (C.-C.W.)
| | - Che-Chuan Wang
- Department of Neurosurgery, Chi Mei Medical Center, Tainan 710402, Taiwan; (K.-C.T.); (T.-T.E.N.); (C.-C.W.)
- Center for General Education, Southern Taiwan University of Science and Technology, Tainan 710402, Taiwan
| | - Nai-Ching Chen
- Department of Nursing, Chi Mei Medical Center, Tainan 710402, Taiwan;
| | - Kuo-Tai Chen
- Department of Emergency, Chi Mei Medical Center, Tainan 710402, Taiwan;
| | - Chia-Jung Chen
- Department of Information Systems, Chi Mei Medical Center, Tainan 710402, Taiwan;
| | - Chung-Feng Liu
- Department of Medical Research, Chi Mei Medical Center, Tainan 710402, Taiwan;
| | - Jinn-Rung Kuo
- Department of Neurosurgery, Chi Mei Medical Center, Tainan 710402, Taiwan; (K.-C.T.); (T.-T.E.N.); (C.-C.W.)
- Center for General Education, Southern Taiwan University of Science and Technology, Tainan 710402, Taiwan
- Correspondence: ; Tel.: +886-6-281-2811-57423
| |
Collapse
|
9
|
Zhang L, Niu M, Zhang H, Wang Y, Zhang H, Mao Z, Zhang X, He M, Wu T, Wang Z, Wang C. Nonlaboratory-based risk assessment model for coronary heart disease screening: Model development and validation. Int J Med Inform 2022; 162:104746. [PMID: 35325662 DOI: 10.1016/j.ijmedinf.2022.104746] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/14/2022] [Accepted: 03/15/2022] [Indexed: 12/11/2022]
Abstract
BACKGROUND Identifying groups at high risk of coronary heart disease (CHD) is important to reduce mortality due to CHD. Although machine learning methods have been introduced, many require laboratory or imaging parameters, which are not always readily available; thus, their wide applications are limited. OBJECTIVE The aim of this study was to develop and validate a simple, efficient, and joint machine learning model for identifying individuals at high risk of CHD using easily obtainable nonlaboratory parameters. METHODS This prospective study used data from the Henan Rural Cohort Study, which was conducted in rural areas of Henan Province, China, between July 2015 and September 2017. A joint machine learning model was developed by selecting and combining four base machine learning algorithms, including logistic regression (LR), artificial neural network (ANN), random forest (RF), and gradient boosting machine (GBM). We used readily accessible variables, including demographics, medical and family history, lifestyle and dietary factors, and anthropometric data, to inform the model. The model was also externally validated by a cohort of individuals from the Dongfeng-Tongji cohort study. Model discrimination was assessed by using the area under the receiver operating characteristic curve (AUC), and calibration was measured by using the Brier score (BS). RESULTS A total of 38 716 participants (mean [SD] age, 55.64[12.19] years; 23449[60.6%] female) from the Henan Rural Cohort Study and 17 958 subjects (mean [SD] age, 62.74 [7.59] years; 10,076 [56.1%] female) from the Dongfeng-Tongji cohort study were included in the analysis. Age, waist circumference, pulse pressure, heart rate, family history of CHD, education level, family history of type 2 diabetes mellitus (T2DM), and family history of dyslipidaemia were strongly associated with the development of CHD. In regard to internal validation, the model we built demonstrated good discrimination (AUC, 0.844 (95% CI 0.828-0.860)) and had acceptable calibration (BS, 0. 066). In regard to external validation, the model performed well with clearly useful discrimination (AUC, 0.792 (95% CI 0.774-0.810)) and robust calibration (BS, 0.069). CONCLUSIONS In this study, the novel and simple, machine learning-based model comprising readily accessible variables accurately identified individuals at high risk of CHD. This model has the potential to be widely applied for large-scale screening of CHD populations, especially in medical resource-constrained settings. TRIAL REGISTRATION The Henan Rural Cohort Study has been registered at the Chinese Clinical Trial Register. (Trial registration: ChiCTR-OOC-15006699. Registered 6 July 2015 - Retrospectively registered) http://www.chictr.org.cn/showproj.aspx?proj=11375.
Collapse
Affiliation(s)
- Liying Zhang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, Henan, PR China; Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, PR China
| | - Miaomiao Niu
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, PR China
| | - Haiyang Zhang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, Henan, PR China
| | - Yikang Wang
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, PR China
| | - Haiqing Zhang
- Department of Occupational and Environmental Health, Key Laboratory of Environment and Health, Ministry of Education and State Key Laboratory of Environmental Health (Incubating) School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, PR China
| | - Zhenxing Mao
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, PR China
| | - Xiaomin Zhang
- Department of Occupational and Environmental Health, Key Laboratory of Environment and Health, Ministry of Education and State Key Laboratory of Environmental Health (Incubating) School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, PR China
| | - Meian He
- Department of Occupational and Environmental Health, Key Laboratory of Environment and Health, Ministry of Education and State Key Laboratory of Environmental Health (Incubating) School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, PR China
| | - Tangchun Wu
- Department of Occupational and Environmental Health, Key Laboratory of Environment and Health, Ministry of Education and State Key Laboratory of Environmental Health (Incubating) School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, PR China
| | - Zhenfei Wang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, Henan, PR China.
| | - Chongjian Wang
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, PR China.
| |
Collapse
|
10
|
Douthit BJ, Walden RL, Cato K, Coviak CP, Cruz C, D'Agostino F, Forbes T, Gao G, Kapetanovic TA, Lee MA, Pruinelli L, Schultz MA, Wieben A, Jeffery AD. Data Science Trends Relevant to Nursing Practice: A Rapid Review of the 2020 Literature. Appl Clin Inform 2022; 13:161-179. [PMID: 35139564 PMCID: PMC8828453 DOI: 10.1055/s-0041-1742218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND The term "data science" encompasses several methods, many of which are considered cutting edge and are being used to influence care processes across the world. Nursing is an applied science and a key discipline in health care systems in both clinical and administrative areas, making the profession increasingly influenced by the latest advances in data science. The greater informatics community should be aware of current trends regarding the intersection of nursing and data science, as developments in nursing practice have cross-professional implications. OBJECTIVES This study aimed to summarize the latest (calendar year 2020) research and applications of nursing-relevant patient outcomes and clinical processes in the data science literature. METHODS We conducted a rapid review of the literature to identify relevant research published during the year 2020. We explored the following 16 topics: (1) artificial intelligence/machine learning credibility and acceptance, (2) burnout, (3) complex care (outpatient), (4) emergency department visits, (5) falls, (6) health care-acquired infections, (7) health care utilization and costs, (8) hospitalization, (9) in-hospital mortality, (10) length of stay, (11) pain, (12) patient safety, (13) pressure injuries, (14) readmissions, (15) staffing, and (16) unit culture. RESULTS Of 16,589 articles, 244 were included in the review. All topics were represented by literature published in 2020, ranging from 1 article to 59 articles. Numerous contemporary data science methods were represented in the literature including the use of machine learning, neural networks, and natural language processing. CONCLUSION This review provides an overview of the data science trends that were relevant to nursing practice in 2020. Examinations of such literature are important to monitor the status of data science's influence in nursing practice.
Collapse
Affiliation(s)
- Brian J. Douthit
- Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Rachel L. Walden
- Annette and Irwin Eskind Family Biomedical Library, Vanderbilt University, Nashville, Tennessee, United States
| | - Kenrick Cato
- Department of Emergency Medicine, Columbia University School of Nursing, New York, New York, United States
| | - Cynthia P. Coviak
- Professor Emerita of Nursing, Grand Valley State University, Allendale, Michigan, United States
| | - Christopher Cruz
- Global Health Technology and Informatics, Chevron, San Ramon, California, United States
| | - Fabio D'Agostino
- Department of Medicine and Surgery, Saint Camillus International University of Health Sciences, Rome, Italy
| | - Thompson Forbes
- College of Nursing, East Carolina University, Greenville, North California, United States
| | - Grace Gao
- Department of Nursing, St Catherine University, Saint Paul, Minnesota, United States
| | - Theresa A. Kapetanovic
- College of Nursing, East Carolina University, Greenville, North California, United States
| | - Mikyoung A. Lee
- College of Nursing, Texas Woman's University, Denton, Texas, United States
| | - Lisiane Pruinelli
- School of Nursing, University of Minnesota, Minneapolis, Minnesota, United States
| | - Mary A. Schultz
- Department of Nursing, California State University, San Bernardino, California, United States
| | - Ann Wieben
- School of Nursing, University of Wisconsin-Madison, Wisconsin, United States
| | - Alvin D. Jeffery
- School of Nursing, Vanderbilt University; Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, Tennessee, United States,Address for correspondence Alvin D. Jeffery, PhD, RN-BC, CCRN-K, FNP-BC 461 21st Avenue South, Nashville, TN 37240United States
| |
Collapse
|
11
|
Tedesco S, Andrulli M, Larsson MÅ, Kelly D, Alamäki A, Timmons S, Barton J, Condell J, O’Flynn B, Nordström A. Comparison of Machine Learning Techniques for Mortality Prediction in a Prospective Cohort of Older Adults. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:12806. [PMID: 34886532 PMCID: PMC8657506 DOI: 10.3390/ijerph182312806] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/01/2021] [Accepted: 12/02/2021] [Indexed: 12/16/2022]
Abstract
As global demographics change, ageing is a global phenomenon which is increasingly of interest in our modern and rapidly changing society. Thus, the application of proper prognostic indices in clinical decisions regarding mortality prediction has assumed a significant importance for personalized risk management (i.e., identifying patients who are at high or low risk of death) and to help ensure effective healthcare services to patients. Consequently, prognostic modelling expressed as all-cause mortality prediction is an important step for effective patient management. Machine learning has the potential to transform prognostic modelling. In this paper, results on the development of machine learning models for all-cause mortality prediction in a cohort of healthy older adults are reported. The models are based on features covering anthropometric variables, physical and lab examinations, questionnaires, and lifestyles, as well as wearable data collected in free-living settings, obtained for the "Healthy Ageing Initiative" study conducted on 2291 recruited participants. Several machine learning techniques including feature engineering, feature selection, data augmentation and resampling were investigated for this purpose. A detailed empirical comparison of the impact of the different techniques is presented and discussed. The achieved performances were also compared with a standard epidemiological model. This investigation showed that, for the dataset under consideration, the best results were achieved with Random UnderSampling in conjunction with Random Forest (either with or without probability calibration). However, while including probability calibration slightly reduced the average performance, it increased the model robustness, as indicated by the lower 95% confidence intervals. The analysis showed that machine learning models could provide comparable results to standard epidemiological models while being completely data-driven and disease-agnostic, thus demonstrating the opportunity for building machine learning models on health records data for research and clinical practice. However, further testing is required to significantly improve the model performance and its robustness.
Collapse
Affiliation(s)
- Salvatore Tedesco
- Tyndall National Institute, University College Cork, Lee Maltings Complex, Dyke Parade, T12R5CP Cork, Ireland; (M.A.); (J.B.); (B.O.)
| | - Martina Andrulli
- Tyndall National Institute, University College Cork, Lee Maltings Complex, Dyke Parade, T12R5CP Cork, Ireland; (M.A.); (J.B.); (B.O.)
| | - Markus Åkerlund Larsson
- Department of Public Health and Clinical Medicine, Section of Sustainable Health, Umeå University, SE-901 87 Umeå, Sweden; (M.Å.L.); (A.N.)
| | - Daniel Kelly
- School of Computing, Engineering and Intelligent Systems, Ulster University, Londonderry BT48 7JL, UK; (D.K.); (J.C.)
| | - Antti Alamäki
- Department of Physiotherapy, Karelia University of Applied Sciences, Tikkarinne 9, FI-80200 Joensuu, Finland;
| | - Suzanne Timmons
- Centre for Gerontology and Rehabilitation, University College Cork, T12XH60 Cork, Ireland;
| | - John Barton
- Tyndall National Institute, University College Cork, Lee Maltings Complex, Dyke Parade, T12R5CP Cork, Ireland; (M.A.); (J.B.); (B.O.)
| | - Joan Condell
- School of Computing, Engineering and Intelligent Systems, Ulster University, Londonderry BT48 7JL, UK; (D.K.); (J.C.)
| | - Brendan O’Flynn
- Tyndall National Institute, University College Cork, Lee Maltings Complex, Dyke Parade, T12R5CP Cork, Ireland; (M.A.); (J.B.); (B.O.)
| | - Anna Nordström
- Department of Public Health and Clinical Medicine, Section of Sustainable Health, Umeå University, SE-901 87 Umeå, Sweden; (M.Å.L.); (A.N.)
- School of Sport Sciences, UiT the Arctic University of Norway, 9037 Tromsø, Norway
| |
Collapse
|
12
|
Lure AC, Du X, Black EW, Irons R, Lemas DJ, Taylor JA, Lavilla O, de la Cruz D, Neu J. Using machine learning analysis to assist in differentiating between necrotizing enterocolitis and spontaneous intestinal perforation: A novel predictive analytic tool. J Pediatr Surg 2021; 56:1703-1710. [PMID: 33342603 DOI: 10.1016/j.jpedsurg.2020.11.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 10/27/2020] [Accepted: 11/07/2020] [Indexed: 02/06/2023]
Abstract
PURPOSE Necrotizing enterocolitis (NEC) and spontaneous intestinal perforation (SIP) are devastating diseases in preterm neonates, often requiring surgical treatment. Previous studies evaluated outcomes in peritoneal drain placement versus laparotomy, but the accuracy of the presumptive diagnosis remains unknown without bowel visualization. Predictive analytics provide the opportunity to determine the etiology of perforation and guide surgical decision making. The purpose of this investigation was to build and evaluate machine learning models to differentiate NEC and SIP. METHODS Neonates who underwent drain placement or laparotomy NEC or SIP were identified and grouped definitively via bowel visualization. Patient characteristics were analyzed using machine learning methodologies, which were optimized through areas under the receiver operating characteristic curve (AUROC). The model was further evaluated using a validation cohort. RESULTS 40 patients were identified. A random forest model achieved 98% AUROC while a ridge logistic regression model reached 92% AUROC in differentiating diseases. When applying the trained random forest model to the validation cohort, outcomes were correctly predicted. CONCLUSIONS This study supports the feasibility of using a novel machine learning model to differentiate between NEC and SIP prior to any intended surgical interventions. LEVEL OF EVIDENCE level II TYPE OF STUDY: Clinical Research Paper.
Collapse
Affiliation(s)
- Allison C Lure
- University of Florida College of Medicine, Department of Pediatrics, 1600 SW Archer Rd, Gainesville, FL 32610, United States.
| | - Xinsong Du
- University of Florida College of Medicine, Department of Health Outcomes & Biomedical Informatics, 2004 Mowry Rd, Gainesville, FL 32610, United States
| | - Erik W Black
- University of Florida College of Medicine, Department of Pediatrics, 1600 SW Archer Rd, Gainesville, FL 32610, United States; University of Florida College of Education, 1221 SW 5th Ave, Gainesville, FL 32601, United States
| | - Raechel Irons
- University of Florida College of Medicine, Department of Pediatrics, 1600 SW Archer Rd, Gainesville, FL 32610, United States
| | - Dominick J Lemas
- University of Florida College of Medicine, Department of Health Outcomes & Biomedical Informatics, 2004 Mowry Rd, Gainesville, FL 32610, United States
| | - Janice A Taylor
- University of Florida College of Medicine, Department of Surgery, 1600 SW Archer Rd, Gainesville, FL 32610, United States
| | - Orlyn Lavilla
- University of Florida College of Medicine, Department of Pediatrics, 1600 SW Archer Rd, Gainesville, FL 32610, United States
| | - Diomel de la Cruz
- University of Florida College of Medicine, Department of Pediatrics, 1600 SW Archer Rd, Gainesville, FL 32610, United States
| | - Josef Neu
- University of Florida College of Medicine, Department of Pediatrics, 1600 SW Archer Rd, Gainesville, FL 32610, United States
| |
Collapse
|
13
|
Satheeshkumar PS, El-Dallal M, Mohan MP. Feature selection and predicting chemotherapy-induced ulcerative mucositis using machine learning methods. Int J Med Inform 2021; 154:104563. [PMID: 34479094 DOI: 10.1016/j.ijmedinf.2021.104563] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 08/17/2021] [Accepted: 08/24/2021] [Indexed: 11/28/2022]
Abstract
OBJECTIVE Ulcerative mucositis (UM) is a devastating complication of most cancer therapies with less recognized risk factors. Whilst risk predictions are most vital in adverse events, we utilized Machine learning (ML) approaches for predicting chemotherapy-induced UM. METHODS We utilized 2017 National Inpatient Sample database to identify discharges with antineoplastic chemotherapy-induced UM among those received chemotherapy as part of their cancer treatment. We used forward selection and backward elimination for feature selection; lasso and Gradient Boosting Method were used for building our linear and non-linear models. RESULTS In 2017, there were 253 (unweighted numbers) chemotherapy-induced UM patient discharges from 21,626 (unweighted numbers) adult patients who received antineoplastic chemotherapy as part of their cancer treatment. Our linear model, lasso showed performance (C-statistics) AUC: 0.75 (test dataset), 0.75 (training dataset); the Gradient Boosting Method (GBM) model showed AUC: 0.76 in the training and 0.79 in the test datasets. The feature selection derived from stepwise forward selection and backward elimination methods showed variables of importance--antineoplastic chemotherapy-induced pancytopenia, agranulocytosis due to cancer chemotherapy, fluid and electrolyte imbalance, age, anemia due to chemotherapy, median household income, and depression. Higher importance variable derived from GBM in the order of importance were antineoplastic chemotherapy-induced pancytopenia > co-morbidity score > agranulocytosis due to cancer chemotherapy > age > and fluid and electrolyte imbalance. Further, when the analysis was stratified to females only, the ML models performed better than the unstratified model. CONCLUSION Our study showed ML methods performed well in predicting the chemotherapy-induced UM. Predictors identified through ML approach matched to the clinically meaningful and previously discussed predictors of the chemotherapy-induced UM.
Collapse
Affiliation(s)
- Poolakkad S Satheeshkumar
- Harvard Medical School, Boston, MA, USA(1); Department of Oral Oncology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA.
| | - Mohammed El-Dallal
- Division of Hospital Medicine, Cambridge Health Alliance and Harvard Medical School, Cambridge, MA, USA; Division of Gastroenterology, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, MA, USA
| | - Minu P Mohan
- University of Massachusetts, Lowell, MA 01854, USA.
| |
Collapse
|
14
|
Understanding current states of machine learning approaches in medical informatics: a systematic literature review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00538-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
15
|
van der Wall HEC, Doll RJ, van Westen GJP, Koopmans I, Zuiker RG, Burggraaf J, Cohen AF. The use of machine learning improves the assessment of drug-induced driving behaviour. ACCIDENT; ANALYSIS AND PREVENTION 2020; 148:105822. [PMID: 33125924 DOI: 10.1016/j.aap.2020.105822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 09/22/2020] [Accepted: 09/30/2020] [Indexed: 06/11/2023]
Abstract
RATIONALE Car-driving performance is negatively affected by the intake of alcohol, tranquillizers, sedatives and sleep deprivation. Although several studies have shown that the standard deviation of the lateral position on the road (SDLP) is sensitive to drug-induced changes in simulated and real driving performance tests, this parameter alone might not fully assess and quantify deviant or unsafe driving. OBJECTIVE Using machine learning we investigated if including multiple simulator-derived parameters, rather than the SDLP alone would provide a more accurate assessment of the effect of substances affecting driving performance. We specifically analysed the effects of alcohol and alprazolam. METHODS The data used in the present study were collected during a previous study on driving effects of alcohol and alprazolam in 24 healthy subjects (12 M, 12 F, mean age 26 years, range 20-43 years). Various driving features, such as speed and steering variations, were quantified and the influence of administration of alcohol or alprazolam was assessed to assist in designing a predictive model for abnormal driving behaviour. RESULTS Adding additional features besides the SDLP increased the model performance for prediction of drug-induced abnormal driving behaviour (from an accuracy of 65 %-83 % after alprazolam intake and from 50 % to 76 % after alcohol ingestion). Driving behaviour influenced by alcohol and alprazolam was characterised by different feature importance, indicating that the two interventions influenced driving behaviour in a different way. CONCLUSION Machine learning using multiple driving features in addition to the state-of-the-art SDLP improves the assessment of drug-induced abnormal driving behaviour. The created models may facilitate quantitative description of abnormal driving behaviour in the development and application of psychopharmacological medicines. Our models require further validation using similar and unknown interventions.
Collapse
Affiliation(s)
- H E C van der Wall
- Centre for Human Drug Research, Leiden, the Netherlands; Leiden Academic Centre for Drug Research, Leiden, the Netherlands.
| | - R J Doll
- Centre for Human Drug Research, Leiden, the Netherlands
| | - G J P van Westen
- Leiden Academic Centre for Drug Research, Leiden, the Netherlands
| | - I Koopmans
- Centre for Human Drug Research, Leiden, the Netherlands
| | - R G Zuiker
- Centre for Human Drug Research, Leiden, the Netherlands
| | - J Burggraaf
- Centre for Human Drug Research, Leiden, the Netherlands; Leiden Academic Centre for Drug Research, Leiden, the Netherlands; Leiden University Medical Centre, Leiden, the Netherlands
| | - A F Cohen
- Centre for Human Drug Research, Leiden, the Netherlands; Leiden Academic Centre for Drug Research, Leiden, the Netherlands; Leiden University Medical Centre, Leiden, the Netherlands
| |
Collapse
|
16
|
Wollenstein-Betech S, Cassandras CG, Paschalidis IC. Personalized predictive models for symptomatic COVID-19 patients using basic preconditions: Hospitalizations, mortality, and the need for an ICU or ventilator. Int J Med Inform 2020; 142:104258. [PMID: 32927229 PMCID: PMC7442577 DOI: 10.1016/j.ijmedinf.2020.104258] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 07/26/2020] [Accepted: 08/17/2020] [Indexed: 01/08/2023]
Abstract
BACKGROUND The rapid global spread of the SARS-CoV-2 virus has provoked a spike in demand for hospital care. Hospital systems across the world have been over-extended, including in Northern Italy, Ecuador, and New York City, and many other systems face similar challenges. As a result, decisions on how to best allocate very limited medical resources and design targeted policies for vulnerable subgroups have come to the forefront. Specifically, under consideration are decisions on who to test, who to admit into hospitals, who to treat in an Intensive Care Unit (ICU), and who to support with a ventilator. Given today's ability to gather, share, analyze and process data, personalized predictive models based on demographics and information regarding prior conditions can be used to (1) help decision-makers allocate limited resources, when needed, (2) advise individuals how to better protect themselves given their risk profile, (3) differentiate social distancing guidelines based on risk, and (4) prioritize vaccinations once a vaccine becomes available. OBJECTIVE To develop personalized models that predict the following events: (1) hospitalization, (2) mortality, (3) need for ICU, and (4) need for a ventilator. To predict hospitalization, it is assumed that one has access to a patient's basic preconditions, which can be easily gathered without the need to be at a hospital and hence serve citizens and policy makers to assess individual risk during a pandemic. For the remaining models, different versions developed include different sets of a patient's features, with some including information on how the disease is progressing (e.g., diagnosis of pneumonia). MATERIALS AND METHODS National data from a publicly available repository, updated daily, containing information from approximately 91,000 patients in Mexico were used. The data for each patient include demographics, prior medical conditions, SARS-CoV-2 test results, hospitalization, mortality and whether a patient has developed pneumonia or not. Several classification methods were applied and compared, including robust versions of logistic regression, and support vector machines, as well as random forests and gradient boosted decision trees. RESULTS Interpretable methods (logistic regression and support vector machines) perform just as well as more complex models in terms of accuracy and detection rates, with the additional benefit of elucidating variables on which the predictions are based. Classification accuracies reached 72 %, 79 %, 89 %, and 90 % for predicting hospitalization, mortality, need for ICU and need for a ventilator, respectively. The analysis reveals the most important preconditions for making the predictions. For the four models derived, these are: (1) for hospitalization:age, pregnancy, diabetes, gender, chronic renal insufficiency, and immunosuppression; (2) for mortality: age, immunosuppression, chronic renal insufficiency, obesity and diabetes; (3) for ICU need: development of pneumonia (if available), age, obesity, diabetes and hypertension; and (4) for ventilator need: ICU and pneumonia (if available), age, obesity, and hypertension.
Collapse
Affiliation(s)
- Salomón Wollenstein-Betech
- Department of Electrical & Computer Engineering, Division of Systems Engineering, Boston University, 8 Saint Mary's St., Boston, MA 02215, USA
| | - Christos G Cassandras
- Department of Electrical & Computer Engineering, Division of Systems Engineering, Boston University, 8 Saint Mary's St., Boston, MA 02215, USA
| | - Ioannis Ch Paschalidis
- Department of Electrical & Computer Engineering, Division of Systems Engineering, Boston University, 8 Saint Mary's St., Boston, MA 02215, USA; Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215, USA.
| |
Collapse
|
17
|
Fu Y, Yang B, Ma Y, Sun Q, Yao J, Fu W, Yin W. Effect of particle size on magnesite flotation based on kinetic studies and machine learning simulation. POWDER TECHNOL 2020. [DOI: 10.1016/j.powtec.2020.08.054] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
18
|
Wollenstein-Betech S, Cassandras CG, Paschalidis IC. Personalized Predictive Models for Symptomatic COVID-19 Patients Using Basic Preconditions: Hospitalizations, Mortality, and the Need for an ICU or Ventilator. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2020:2020.05.03.20089813. [PMID: 32511489 PMCID: PMC7273257 DOI: 10.1101/2020.05.03.20089813] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
BACKGROUND The rapid global spread of the virus SARS-CoV-2 has provoked a spike in demand for hospital care. Hospital systems across the world have been over-extended, including in Northern Italy, Ecuador, and New York City, and many other systems face similar challenges. As a result, decisions on how to best allocate very limited medical resources have come to the forefront. Specifically, under consideration are decisions on who to test, who to admit into hospitals, who to treat in an Intensive Care Unit (ICU), and who to support with a ventilator. Given today's ability to gather, share, analyze and process data, personalized predictive models based on demographics and information regarding prior conditions can be used to (1) help decision-makers allocate limited resources, when needed, (2) advise individuals how to better protect themselves given their risk profile, (3) differentiate social distancing guidelines based on risk, and (4) prioritize vaccinations once a vaccine becomes available. OBJECTIVE To develop personalized models that predict the following events: (1) hospitalization, (2) mortality, (3) need for ICU, and (4) need for a ventilator. To predict hospitalization, it is assumed that one has access to a patient's basic preconditions, which can be easily gathered without the need to be at a hospital. For the remaining models, different versions developed include different sets of a patient's features, with some including information on how the disease is progressing (e.g., diagnosis of pneumonia). MATERIALS AND METHODS Data from a publicly available repository, updated daily, containing information from approximately 91,000 patients in Mexico were used. The data for each patient include demographics, prior medical conditions, SARS-CoV-2 test results, hospitalization, mortality and whether a patient has developed pneumonia or not. Several classification methods were applied, including robust versions of logistic regression, and support vector machines, as well as random forests and gradient boosted decision trees. RESULTS Interpretable methods (logistic regression and support vector machines) perform just as well as more complex models in terms of accuracy and detection rates, with the additional benefit of elucidating variables on which the predictions are based. Classification accuracies reached 61%, 76%, 83%, and 84% for predicting hospitalization, mortality, need for ICU and need for a ventilator, respectively. The analysis reveals the most important preconditions for making the predictions. For the four models derived, these are: (1) for hospitalization: age, gender, chronic renal insufficiency, diabetes, immunosuppression; (2) for mortality: age, SARS-CoV-2 test status, immunosuppression and pregnancy; (3) for ICU need: development of pneumonia (if available), cardiovascular disease, asthma, and SARS-CoV-2 test status; and (4) for ventilator need: ICU and pneumonia (if available), age, gender, cardiovascular disease, obesity, pregnancy, and SARS-CoV-2 test result.
Collapse
Affiliation(s)
- Salomón Wollenstein-Betech
- Division of Systems Engineering, Department of Electrical and Computer Engineering, Boston University, Boston, MA 02215
| | - Christos G Cassandras
- Division of Systems Engineering, Department of Electrical and Computer Engineering, Boston University, Boston, MA 02215
| | - Ioannis Ch Paschalidis
- Division of Systems Engineering, Department of Electrical and Computer Engineering, Department of Biomedical Engineering, Boston University, Boston, MA 02215
| |
Collapse
|