Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Du X, Min J, Shah CP, Bishnoi R, Hogan WR, Lemas DJ. Predicting in-hospital mortality of patients with febrile neutropenia using machine learning models. Int J Med Inform 2020;139:104140. [PMID: 32325370 DOI: 10.1016/j.ijmedinf.2020.104140] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/12/2020] [Accepted: 04/03/2020] [Indexed: 11/30/2022]

For:	Du X, Min J, Shah CP, Bishnoi R, Hogan WR, Lemas DJ. Predicting in-hospital mortality of patients with febrile neutropenia using machine learning models. Int J Med Inform 2020;139:104140. [PMID: 32325370 DOI: 10.1016/j.ijmedinf.2020.104140] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/12/2020] [Accepted: 04/03/2020] [Indexed: 11/30/2022]

Number

Cited by Other Article(s)

Lemas DJ, Du X, Rouhizadeh M, Lewis B, Frank S, Wright L, Spirache A, Gonzalez L, Cheves R, Magalhães M, Zapata R, Reddy R, Xu K, Parker L, Harle C, Young B, Louis-Jaques A, Zhang B, Thompson L, Hogan WR, Modave F. Classifying early infant feeding status from clinical notes using natural language processing and machine learning. Sci Rep 2024;14:7831. [PMID: 38570569 PMCID: PMC10991582 DOI: 10.1038/s41598-024-58299-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 03/27/2024] [Indexed: 04/05/2024] Open

Abstract

The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother's milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.

Collapse

Affiliation(s)

Dominick J Lemas Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA. Department of Obstetrics and Gynecology, University of Florida College of Medicine, Gainesville, FL, 32610, USA.
Xinsong Du Division of General Internal Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, 02115, USA Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
Masoud Rouhizadeh Department of Pharmaceutical Outcomes and Policy, University of Florida College of Medicine, Gainesville, FL, 32610, USA Biomedical Informatics and Data Science Section, Division of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
Braeden Lewis Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
Simon Frank Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
Lauren Wright Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
Alex Spirache Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
Lisa Gonzalez Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
Ryan Cheves Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
Marina Magalhães Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, 94305, USA
Ruben Zapata Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
Rahul Reddy Department of Computer and Information Science, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, 32611, USA
Ke Xu Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
Leslie Parker Department of Biobehavioral Nursing Science, University of Florida College of Nursing, Gainesville, FL, 32603, USA
Chris Harle Health Policy and Management Department, Richard M. Fairbanks School of Public Health, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202, USA
Bridget Young Division of Breastfeeding and Lactation Medicine, University of Rochester Medical Center, Rochester, NY, 14642, USA
Adetola Louis-Jaques Department of Obstetrics and Gynecology, University of Florida College of Medicine, Gainesville, FL, 32610, USA
Bouri Zhang Health Science Center Libraries, University of Florida, Gainesville, FL, 32610, USA
Lindsay Thompson Department of Pediatrics, Wake Forest School of Medicine, Winston-Salem, NC, 27101, USA
William R Hogan Data Science Institute, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
François Modave Department of Anesthesiology, University of Florida College of Medicine, Gainesville, FL, 32610, USA

Collapse

Gallardo-Pizarro A, Peyrony O, Chumbita M, Monzo-Gallo P, Aiello TF, Teijon-Lumbreras C, Gras E, Mensa J, Soriano A, Garcia-Vidal C. Improving management of febrile neutropenia in oncology patients: the role of artificial intelligence and machine learning. Expert Rev Anti Infect Ther 2024;22:179-187. [PMID: 38457198 DOI: 10.1080/14787210.2024.2322445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 02/20/2024] [Indexed: 03/09/2024]

Xie F, Beukelman T, Sun D, Yun H, Curtis JR. Identifying inpatient mortality in MarketScan claims data using machine learning. Pharmacoepidemiol Drug Saf 2023;32:1299-1305. [PMID: 37344984 DOI: 10.1002/pds.5658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 05/24/2023] [Accepted: 06/19/2023] [Indexed: 06/23/2023]

Le JP, Shashikumar SP, Malhotra A, Nemati S, Wardi G. Making the Improbable Possible: Generalizing Models Designed for a Syndrome-Based, Heterogeneous Patient Landscape. Crit Care Clin 2023;39:751-768. [PMID: 37704338 PMCID: PMC10758922 DOI: 10.1016/j.ccc.2023.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]

Ma J, Dhiman P, Qi C, Bullock G, van Smeden M, Riley RD, Collins GS. Poor handling of continuous predictors in clinical prediction models using logistic regression: a systematic review. J Clin Epidemiol 2023;161:140-151. [PMID: 37536504 DOI: 10.1016/j.jclinepi.2023.07.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 07/20/2023] [Accepted: 07/26/2023] [Indexed: 08/05/2023]

Padmanabhan R, Elomri A, Taha RY, El Omri H, Elsabah H, El Omri A. Prediction of Multiple Clinical Complications in Cancer Patients to Ensure Hospital Preparedness and Improved Cancer Care. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;20:526. [PMID: 36612856 PMCID: PMC9819091 DOI: 10.3390/ijerph20010526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 11/22/2022] [Accepted: 12/04/2022] [Indexed: 06/17/2023]

Abstract

Reliable and rapid medical diagnosis is the cornerstone for improving the survival rate and quality of life of cancer patients. The problem of clinical decision-making pertaining to the management of patients with hematologic cancer is multifaceted and intricate due to the risk of therapy-induced myelosuppression, multiple infections, and febrile neutropenia (FN). Myelosuppression due to treatment increases the risk of sepsis and mortality in hematological cancer patients with febrile neutropenia. A high prevalence of multidrug-resistant organisms is also noted in such patients, which implies that these patients are left with limited or no-treatment options amidst severe health complications. Hence, early screening of patients for such organisms in their bodies is vital to enable hospital preparedness, curtail the spread to other weak patients in hospitals, and limit community outbreaks. Even though predictive models for sepsis and mortality exist, no model has been suggested for the prediction of multidrug-resistant organisms in hematological cancer patients with febrile neutropenia. Hence, for predicting three critical clinical complications, such as sepsis, the presence of multidrug-resistant organisms, and mortality, from the data available from medical records, we used 1166 febrile neutropenia episodes reported in 513 patients. The XGboost algorithm is suggested from 10-fold cross-validation on 6 candidate models. Other highlights are (1) a novel set of easily available features for the prediction of the aforementioned clinical complications and (2) the use of data augmentation methods and model-scoring-based hyperparameter tuning to address the problem of class disproportionality, a common challenge in medical datasets and often the reason behind poor event prediction rate of various predictive models reported so far. The proposed model depicts improved recall and AUC (area under the curve) for sepsis (recall = 98%, AUC = 0.85), multidrug-resistant organism (recall = 96%, AUC = 0.91), and mortality (recall = 86%, AUC = 0.88) prediction. Our results encourage the need to popularize artificial intelligence-based devices to support clinical decision-making.

Collapse

The Prognostic Utility of Lymphocyte-Based Measures and Ratios in Chemotherapy-Induced Febrile Neutropenia Patients following Granulocyte Colony-Stimulating Factor Therapy. Medicina (B Aires) 2022;58:medicina58111508. [DOI: 10.3390/medicina58111508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 10/11/2022] [Accepted: 10/13/2022] [Indexed: 11/05/2022] Open

Abstract Background and Objectives: Chemotherapy-induced febrile neutropenia is the most widespread oncologic emergency with high morbidity and mortality rates. Herein we present a retrospective risk factor identification study to evaluate the prognostic role of lymphocyte-based measures and ratios in a cohort of chemotherapy-induced febrile neutropenia patients following granulocyte colony-stimulating factor (G-CSF) therapy. Materials and Methods: The electronic medical records at our center were utilized to identify patients with a first attack of chemotherapy-induced febrile neutropenia and were treated accordingly with G-CSF between January 2010 to December 2020. Patients’ demographics and disease characteristics along with laboratory tests data were extracted. Prognosis-related indicators were the absolute neutrophil count (ANC) at admission and the following 6 days besides the length of stay and mortality rate. Results: A total of 80 patients were enrolled, which were divided according to the absolute lymphocyte count at admission into two groups, the first includes lymphopenia patients (n = 55) and the other is the non-lymphopenia group (n = 25) with a cutoff point of 700 lymphocytes/μL. Demographics and baseline characteristics were generally insignificant among the two groups but the white blood cell count was higher in the non-lymphopenia group. ANC, neutrophils percentage and ANC difference in reference to admission among the two study groups were totally insignificant. The same insignificant pattern was observed in the length of stay and the mortality rate. Univariate analysis utilizing the ANC difference compared to the admission day as the dependent variable, revealed no predictability role in the first three days of follow up for any of the variables included. However, during the fourth day of follow up, both WBC (OR = 0.261; 95% CI: 0.075, 0.908; p = 0.035) and lymphocyte percentage (OR = 1.074; 95% CI: 1.012, 1.141; p = 0.019) were marginally significant, in which increasing WBC was associated with a reduction in the likelihood of ANC count increase, compared to the lymphocyte percentage which exhibited an increase in the likelihood. In comparison, sequential ANC difference models demonstrated lymphocyte percentage (OR = 0.961; 95% CI: 0.932, 0.991; p = 0.011) and monocyte-to-lymphocyte ratio (OR = 7.436; 95% CI: 1.024, 54.020; p = 0.047) reduction and increment in the enhancement of ANC levels, respectively. The fifth day had WBC (OR = 0.790; 95% CI: 0.675, 0.925; p = 0.003) to be significantly decreasing the likelihood of ANC increment. Conclusions: we were unable to determine any concrete prognostic role of lymphocyte-related measures and ratios. It is plausible that several limitations could have influenced the results obtained, but as far as our analysis is concerned ALC role as a predictive factor for ANC changes remains questionable. Collapse

Tu KC, Eric Nyam TT, Wang CC, Chen NC, Chen KT, Chen CJ, Liu CF, Kuo JR. A Computer-Assisted System for Early Mortality Risk Prediction in Patients with Traumatic Brain Injury Using Artificial Intelligence Algorithms in Emergency Room Triage. Brain Sci 2022;12:brainsci12050612. [PMID: 35624999 PMCID: PMC9138998 DOI: 10.3390/brainsci12050612] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 05/05/2022] [Indexed: 01/27/2023] Open

Zhang L, Niu M, Zhang H, Wang Y, Zhang H, Mao Z, Zhang X, He M, Wu T, Wang Z, Wang C. Nonlaboratory-based risk assessment model for coronary heart disease screening: Model development and validation. Int J Med Inform 2022;162:104746. [PMID: 35325662 DOI: 10.1016/j.ijmedinf.2022.104746] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/14/2022] [Accepted: 03/15/2022] [Indexed: 12/11/2022]

Abstract

BACKGROUND

Identifying groups at high risk of coronary heart disease (CHD) is important to reduce mortality due to CHD. Although machine learning methods have been introduced, many require laboratory or imaging parameters, which are not always readily available; thus, their wide applications are limited.

OBJECTIVE

The aim of this study was to develop and validate a simple, efficient, and joint machine learning model for identifying individuals at high risk of CHD using easily obtainable nonlaboratory parameters.

METHODS

This prospective study used data from the Henan Rural Cohort Study, which was conducted in rural areas of Henan Province, China, between July 2015 and September 2017. A joint machine learning model was developed by selecting and combining four base machine learning algorithms, including logistic regression (LR), artificial neural network (ANN), random forest (RF), and gradient boosting machine (GBM). We used readily accessible variables, including demographics, medical and family history, lifestyle and dietary factors, and anthropometric data, to inform the model. The model was also externally validated by a cohort of individuals from the Dongfeng-Tongji cohort study. Model discrimination was assessed by using the area under the receiver operating characteristic curve (AUC), and calibration was measured by using the Brier score (BS).

RESULTS

A total of 38 716 participants (mean [SD] age, 55.64[12.19] years; 23449[60.6%] female) from the Henan Rural Cohort Study and 17 958 subjects (mean [SD] age, 62.74 [7.59] years; 10,076 [56.1%] female) from the Dongfeng-Tongji cohort study were included in the analysis. Age, waist circumference, pulse pressure, heart rate, family history of CHD, education level, family history of type 2 diabetes mellitus (T2DM), and family history of dyslipidaemia were strongly associated with the development of CHD. In regard to internal validation, the model we built demonstrated good discrimination (AUC, 0.844 (95% CI 0.828-0.860)) and had acceptable calibration (BS, 0. 066). In regard to external validation, the model performed well with clearly useful discrimination (AUC, 0.792 (95% CI 0.774-0.810)) and robust calibration (BS, 0.069).

CONCLUSIONS

In this study, the novel and simple, machine learning-based model comprising readily accessible variables accurately identified individuals at high risk of CHD. This model has the potential to be widely applied for large-scale screening of CHD populations, especially in medical resource-constrained settings.

TRIAL REGISTRATION

The Henan Rural Cohort Study has been registered at the Chinese Clinical Trial Register. (Trial registration: ChiCTR-OOC-15006699. Registered 6 July 2015 - Retrospectively registered) http://www.chictr.org.cn/showproj.aspx?proj=11375.

Collapse

Affiliation(s)

Liying Zhang School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, Henan, PR China; Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, PR China
Miaomiao Niu Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, PR China
Haiyang Zhang School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, Henan, PR China
Yikang Wang Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, PR China
Haiqing Zhang Department of Occupational and Environmental Health, Key Laboratory of Environment and Health, Ministry of Education and State Key Laboratory of Environmental Health (Incubating) School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, PR China
Zhenxing Mao Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, PR China
Xiaomin Zhang Department of Occupational and Environmental Health, Key Laboratory of Environment and Health, Ministry of Education and State Key Laboratory of Environmental Health (Incubating) School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, PR China
Meian He Department of Occupational and Environmental Health, Key Laboratory of Environment and Health, Ministry of Education and State Key Laboratory of Environmental Health (Incubating) School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, PR China
Tangchun Wu Department of Occupational and Environmental Health, Key Laboratory of Environment and Health, Ministry of Education and State Key Laboratory of Environmental Health (Incubating) School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, PR China
Zhenfei Wang School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, Henan, PR China.
Chongjian Wang Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, PR China.

Collapse

Douthit BJ, Walden RL, Cato K, Coviak CP, Cruz C, D'Agostino F, Forbes T, Gao G, Kapetanovic TA, Lee MA, Pruinelli L, Schultz MA, Wieben A, Jeffery AD. Data Science Trends Relevant to Nursing Practice: A Rapid Review of the 2020 Literature. Appl Clin Inform 2022;13:161-179. [PMID: 35139564 PMCID: PMC8828453 DOI: 10.1055/s-0041-1742218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open

Abstract

BACKGROUND

The term "data science" encompasses several methods, many of which are considered cutting edge and are being used to influence care processes across the world. Nursing is an applied science and a key discipline in health care systems in both clinical and administrative areas, making the profession increasingly influenced by the latest advances in data science. The greater informatics community should be aware of current trends regarding the intersection of nursing and data science, as developments in nursing practice have cross-professional implications.

OBJECTIVES

This study aimed to summarize the latest (calendar year 2020) research and applications of nursing-relevant patient outcomes and clinical processes in the data science literature.

METHODS

We conducted a rapid review of the literature to identify relevant research published during the year 2020. We explored the following 16 topics: (1) artificial intelligence/machine learning credibility and acceptance, (2) burnout, (3) complex care (outpatient), (4) emergency department visits, (5) falls, (6) health care-acquired infections, (7) health care utilization and costs, (8) hospitalization, (9) in-hospital mortality, (10) length of stay, (11) pain, (12) patient safety, (13) pressure injuries, (14) readmissions, (15) staffing, and (16) unit culture.

RESULTS

Of 16,589 articles, 244 were included in the review. All topics were represented by literature published in 2020, ranging from 1 article to 59 articles. Numerous contemporary data science methods were represented in the literature including the use of machine learning, neural networks, and natural language processing.

CONCLUSION

This review provides an overview of the data science trends that were relevant to nursing practice in 2020. Examinations of such literature are important to monitor the status of data science's influence in nursing practice.

Collapse

Tedesco S, Andrulli M, Larsson MÅ, Kelly D, Alamäki A, Timmons S, Barton J, Condell J, O’Flynn B, Nordström A. Comparison of Machine Learning Techniques for Mortality Prediction in a Prospective Cohort of Older Adults. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021;18:12806. [PMID: 34886532 PMCID: PMC8657506 DOI: 10.3390/ijerph182312806] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/01/2021] [Accepted: 12/02/2021] [Indexed: 12/16/2022]

Abstract

As global demographics change, ageing is a global phenomenon which is increasingly of interest in our modern and rapidly changing society. Thus, the application of proper prognostic indices in clinical decisions regarding mortality prediction has assumed a significant importance for personalized risk management (i.e., identifying patients who are at high or low risk of death) and to help ensure effective healthcare services to patients. Consequently, prognostic modelling expressed as all-cause mortality prediction is an important step for effective patient management. Machine learning has the potential to transform prognostic modelling. In this paper, results on the development of machine learning models for all-cause mortality prediction in a cohort of healthy older adults are reported. The models are based on features covering anthropometric variables, physical and lab examinations, questionnaires, and lifestyles, as well as wearable data collected in free-living settings, obtained for the "Healthy Ageing Initiative" study conducted on 2291 recruited participants. Several machine learning techniques including feature engineering, feature selection, data augmentation and resampling were investigated for this purpose. A detailed empirical comparison of the impact of the different techniques is presented and discussed. The achieved performances were also compared with a standard epidemiological model. This investigation showed that, for the dataset under consideration, the best results were achieved with Random UnderSampling in conjunction with Random Forest (either with or without probability calibration). However, while including probability calibration slightly reduced the average performance, it increased the model robustness, as indicated by the lower 95% confidence intervals. The analysis showed that machine learning models could provide comparable results to standard epidemiological models while being completely data-driven and disease-agnostic, thus demonstrating the opportunity for building machine learning models on health records data for research and clinical practice. However, further testing is required to significantly improve the model performance and its robustness.

Collapse

Lure AC, Du X, Black EW, Irons R, Lemas DJ, Taylor JA, Lavilla O, de la Cruz D, Neu J. Using machine learning analysis to assist in differentiating between necrotizing enterocolitis and spontaneous intestinal perforation: A novel predictive analytic tool. J Pediatr Surg 2021;56:1703-1710. [PMID: 33342603 DOI: 10.1016/j.jpedsurg.2020.11.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 10/27/2020] [Accepted: 11/07/2020] [Indexed: 02/06/2023]

Satheeshkumar PS, El-Dallal M, Mohan MP. Feature selection and predicting chemotherapy-induced ulcerative mucositis using machine learning methods. Int J Med Inform 2021;154:104563. [PMID: 34479094 DOI: 10.1016/j.ijmedinf.2021.104563] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 08/17/2021] [Accepted: 08/24/2021] [Indexed: 11/28/2022]

Abstract

OBJECTIVE

Ulcerative mucositis (UM) is a devastating complication of most cancer therapies with less recognized risk factors. Whilst risk predictions are most vital in adverse events, we utilized Machine learning (ML) approaches for predicting chemotherapy-induced UM.

METHODS

We utilized 2017 National Inpatient Sample database to identify discharges with antineoplastic chemotherapy-induced UM among those received chemotherapy as part of their cancer treatment. We used forward selection and backward elimination for feature selection; lasso and Gradient Boosting Method were used for building our linear and non-linear models.

RESULTS

In 2017, there were 253 (unweighted numbers) chemotherapy-induced UM patient discharges from 21,626 (unweighted numbers) adult patients who received antineoplastic chemotherapy as part of their cancer treatment. Our linear model, lasso showed performance (C-statistics) AUC: 0.75 (test dataset), 0.75 (training dataset); the Gradient Boosting Method (GBM) model showed AUC: 0.76 in the training and 0.79 in the test datasets. The feature selection derived from stepwise forward selection and backward elimination methods showed variables of importance--antineoplastic chemotherapy-induced pancytopenia, agranulocytosis due to cancer chemotherapy, fluid and electrolyte imbalance, age, anemia due to chemotherapy, median household income, and depression. Higher importance variable derived from GBM in the order of importance were antineoplastic chemotherapy-induced pancytopenia > co-morbidity score > agranulocytosis due to cancer chemotherapy > age > and fluid and electrolyte imbalance. Further, when the analysis was stratified to females only, the ML models performed better than the unstratified model.

CONCLUSION

Our study showed ML methods performed well in predicting the chemotherapy-induced UM. Predictors identified through ML approach matched to the clinically meaningful and previously discussed predictors of the chemotherapy-induced UM.

Collapse

Understanding current states of machine learning approaches in medical informatics: a systematic literature review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00538-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

van der Wall HEC, Doll RJ, van Westen GJP, Koopmans I, Zuiker RG, Burggraaf J, Cohen AF. The use of machine learning improves the assessment of drug-induced driving behaviour. ACCIDENT; ANALYSIS AND PREVENTION 2020;148:105822. [PMID: 33125924 DOI: 10.1016/j.aap.2020.105822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 09/22/2020] [Accepted: 09/30/2020] [Indexed: 06/11/2023]

Abstract

RATIONALE

Car-driving performance is negatively affected by the intake of alcohol, tranquillizers, sedatives and sleep deprivation. Although several studies have shown that the standard deviation of the lateral position on the road (SDLP) is sensitive to drug-induced changes in simulated and real driving performance tests, this parameter alone might not fully assess and quantify deviant or unsafe driving.

OBJECTIVE

Using machine learning we investigated if including multiple simulator-derived parameters, rather than the SDLP alone would provide a more accurate assessment of the effect of substances affecting driving performance. We specifically analysed the effects of alcohol and alprazolam.

METHODS

The data used in the present study were collected during a previous study on driving effects of alcohol and alprazolam in 24 healthy subjects (12 M, 12 F, mean age 26 years, range 20-43 years). Various driving features, such as speed and steering variations, were quantified and the influence of administration of alcohol or alprazolam was assessed to assist in designing a predictive model for abnormal driving behaviour.

RESULTS

Adding additional features besides the SDLP increased the model performance for prediction of drug-induced abnormal driving behaviour (from an accuracy of 65 %-83 % after alprazolam intake and from 50 % to 76 % after alcohol ingestion). Driving behaviour influenced by alcohol and alprazolam was characterised by different feature importance, indicating that the two interventions influenced driving behaviour in a different way.

CONCLUSION

Machine learning using multiple driving features in addition to the state-of-the-art SDLP improves the assessment of drug-induced abnormal driving behaviour. The created models may facilitate quantitative description of abnormal driving behaviour in the development and application of psychopharmacological medicines. Our models require further validation using similar and unknown interventions.

Collapse

Wollenstein-Betech S, Cassandras CG, Paschalidis IC. Personalized predictive models for symptomatic COVID-19 patients using basic preconditions: Hospitalizations, mortality, and the need for an ICU or ventilator. Int J Med Inform 2020;142:104258. [PMID: 32927229 PMCID: PMC7442577 DOI: 10.1016/j.ijmedinf.2020.104258] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 07/26/2020] [Accepted: 08/17/2020] [Indexed: 01/08/2023]

Abstract

BACKGROUND

The rapid global spread of the SARS-CoV-2 virus has provoked a spike in demand for hospital care. Hospital systems across the world have been over-extended, including in Northern Italy, Ecuador, and New York City, and many other systems face similar challenges. As a result, decisions on how to best allocate very limited medical resources and design targeted policies for vulnerable subgroups have come to the forefront. Specifically, under consideration are decisions on who to test, who to admit into hospitals, who to treat in an Intensive Care Unit (ICU), and who to support with a ventilator. Given today's ability to gather, share, analyze and process data, personalized predictive models based on demographics and information regarding prior conditions can be used to (1) help decision-makers allocate limited resources, when needed, (2) advise individuals how to better protect themselves given their risk profile, (3) differentiate social distancing guidelines based on risk, and (4) prioritize vaccinations once a vaccine becomes available.

OBJECTIVE

To develop personalized models that predict the following events: (1) hospitalization, (2) mortality, (3) need for ICU, and (4) need for a ventilator. To predict hospitalization, it is assumed that one has access to a patient's basic preconditions, which can be easily gathered without the need to be at a hospital and hence serve citizens and policy makers to assess individual risk during a pandemic. For the remaining models, different versions developed include different sets of a patient's features, with some including information on how the disease is progressing (e.g., diagnosis of pneumonia).

MATERIALS AND METHODS

National data from a publicly available repository, updated daily, containing information from approximately 91,000 patients in Mexico were used. The data for each patient include demographics, prior medical conditions, SARS-CoV-2 test results, hospitalization, mortality and whether a patient has developed pneumonia or not. Several classification methods were applied and compared, including robust versions of logistic regression, and support vector machines, as well as random forests and gradient boosted decision trees.

RESULTS

Interpretable methods (logistic regression and support vector machines) perform just as well as more complex models in terms of accuracy and detection rates, with the additional benefit of elucidating variables on which the predictions are based. Classification accuracies reached 72 %, 79 %, 89 %, and 90 % for predicting hospitalization, mortality, need for ICU and need for a ventilator, respectively. The analysis reveals the most important preconditions for making the predictions. For the four models derived, these are: (1) for hospitalization:age, pregnancy, diabetes, gender, chronic renal insufficiency, and immunosuppression; (2) for mortality: age, immunosuppression, chronic renal insufficiency, obesity and diabetes; (3) for ICU need: development of pneumonia (if available), age, obesity, diabetes and hypertension; and (4) for ventilator need: ICU and pneumonia (if available), age, obesity, and hypertension.

Collapse

Fu Y, Yang B, Ma Y, Sun Q, Yao J, Fu W, Yin W. Effect of particle size on magnesite flotation based on kinetic studies and machine learning simulation. POWDER TECHNOL 2020. [DOI: 10.1016/j.powtec.2020.08.054] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Wollenstein-Betech S, Cassandras CG, Paschalidis IC. Personalized Predictive Models for Symptomatic COVID-19 Patients Using Basic Preconditions: Hospitalizations, Mortality, and the Need for an ICU or Ventilator. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2020:2020.05.03.20089813. [PMID: 32511489 PMCID: PMC7273257 DOI: 10.1101/2020.05.03.20089813] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Abstract

BACKGROUND

The rapid global spread of the virus SARS-CoV-2 has provoked a spike in demand for hospital care. Hospital systems across the world have been over-extended, including in Northern Italy, Ecuador, and New York City, and many other systems face similar challenges. As a result, decisions on how to best allocate very limited medical resources have come to the forefront. Specifically, under consideration are decisions on who to test, who to admit into hospitals, who to treat in an Intensive Care Unit (ICU), and who to support with a ventilator. Given today's ability to gather, share, analyze and process data, personalized predictive models based on demographics and information regarding prior conditions can be used to (1) help decision-makers allocate limited resources, when needed, (2) advise individuals how to better protect themselves given their risk profile, (3) differentiate social distancing guidelines based on risk, and (4) prioritize vaccinations once a vaccine becomes available.

OBJECTIVE

To develop personalized models that predict the following events: (1) hospitalization, (2) mortality, (3) need for ICU, and (4) need for a ventilator. To predict hospitalization, it is assumed that one has access to a patient's basic preconditions, which can be easily gathered without the need to be at a hospital. For the remaining models, different versions developed include different sets of a patient's features, with some including information on how the disease is progressing (e.g., diagnosis of pneumonia).

MATERIALS AND METHODS

Data from a publicly available repository, updated daily, containing information from approximately 91,000 patients in Mexico were used. The data for each patient include demographics, prior medical conditions, SARS-CoV-2 test results, hospitalization, mortality and whether a patient has developed pneumonia or not. Several classification methods were applied, including robust versions of logistic regression, and support vector machines, as well as random forests and gradient boosted decision trees.

RESULTS

Interpretable methods (logistic regression and support vector machines) perform just as well as more complex models in terms of accuracy and detection rates, with the additional benefit of elucidating variables on which the predictions are based. Classification accuracies reached 61%, 76%, 83%, and 84% for predicting hospitalization, mortality, need for ICU and need for a ventilator, respectively. The analysis reveals the most important preconditions for making the predictions. For the four models derived, these are: (1) for hospitalization: age, gender, chronic renal insufficiency, diabetes, immunosuppression; (2) for mortality: age, SARS-CoV-2 test status, immunosuppression and pregnancy; (3) for ICU need: development of pneumonia (if available), cardiovascular disease, asthma, and SARS-CoV-2 test status; and (4) for ventilator need: ICU and pneumonia (if available), age, gender, cardiovascular disease, obesity, pregnancy, and SARS-CoV-2 test result.

Collapse