1
|
Lemas DJ, Du X, Rouhizadeh M, Lewis B, Frank S, Wright L, Spirache A, Gonzalez L, Cheves R, Magalhães M, Zapata R, Reddy R, Xu K, Parker L, Harle C, Young B, Louis-Jaques A, Zhang B, Thompson L, Hogan WR, Modave F. Classifying early infant feeding status from clinical notes using natural language processing and machine learning. Sci Rep 2024; 14:7831. [PMID: 38570569 PMCID: PMC10991582 DOI: 10.1038/s41598-024-58299-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 03/27/2024] [Indexed: 04/05/2024] Open
Abstract
The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother's milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.
Collapse
Affiliation(s)
- Dominick J Lemas
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA.
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Gainesville, FL, 32610, USA.
| | - Xinsong Du
- Division of General Internal Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, 02115, USA
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
| | - Masoud Rouhizadeh
- Department of Pharmaceutical Outcomes and Policy, University of Florida College of Medicine, Gainesville, FL, 32610, USA
- Biomedical Informatics and Data Science Section, Division of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Braeden Lewis
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Simon Frank
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Lauren Wright
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Alex Spirache
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Lisa Gonzalez
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Ryan Cheves
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Marina Magalhães
- Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, 94305, USA
| | - Ruben Zapata
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Rahul Reddy
- Department of Computer and Information Science, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Ke Xu
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Leslie Parker
- Department of Biobehavioral Nursing Science, University of Florida College of Nursing, Gainesville, FL, 32603, USA
| | - Chris Harle
- Health Policy and Management Department, Richard M. Fairbanks School of Public Health, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202, USA
| | - Bridget Young
- Division of Breastfeeding and Lactation Medicine, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Adetola Louis-Jaques
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Gainesville, FL, 32610, USA
| | - Bouri Zhang
- Health Science Center Libraries, University of Florida, Gainesville, FL, 32610, USA
| | - Lindsay Thompson
- Department of Pediatrics, Wake Forest School of Medicine, Winston-Salem, NC, 27101, USA
| | - William R Hogan
- Data Science Institute, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - François Modave
- Department of Anesthesiology, University of Florida College of Medicine, Gainesville, FL, 32610, USA
| |
Collapse
|
2
|
García-de-León-Chocano R, Sáez C, Muñoz-Soler V, Oliver-Roig A, García-de-León-González R, García-Gómez JM. Robust estimation of infant feeding indicators by data quality assessment of longitudinal electronic health records from birth up to 18 months of life. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 207:106147. [PMID: 34020376 DOI: 10.1016/j.cmpb.2021.106147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 04/26/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND AND OBJECTIVE The Baby-Friendly Hospital Initiative (BFHI) is an international strategy aimed at improving breastfeeding practices in health care services. Regular monitoring of indicators is key for BFHI implementation and maintenance. Currently, routine data collected from electronic health records (EHR) is an excellent source for infant feeding monitoring, however data quality (DQ) assessment should be undertaken. The aim of this research is to enable robust estimations of infant feeding indicators through DQ assessment of routine EHR data. MATERIALS AND METHODS We use the longitudinal series of healthcare contacts belonging to 6427 children born from 2009 to 2018 in the Health Area V of Murcia (Spain). Longitudinal data came from EHR at hospital discharge and community infant health reviews up to 18 months. The data of each healthcare contact contained a 24-h recall of infant feeding. We perform a DQ process in three phases: (1) an assessment of each-single-contact and the definition of their infant feeding status; (2) a longitudinal DQ assessment of completeness and consistency of the series of contacts to obtain meta-information that guides the duration calculus, for each case, of the different types of breastfeeding: exclusive breastfeeding (EBF), full breastfeeding (FBF) and any breastfeeding (ABF); and finally (3) a robust estimation of indicators and description of DQ of each indicator. RESULTS We found deficiencies of DQ in 30.42% of single contacts for EBF, 19.02% for FBF and 22.50% for ABF that were used to establish the infant feeding status. However, after longitudinal DQ assessment, we obtained valid and reliable data rates for most indicators such as "median duration of breastfeeding" nearly 90%, both for FBF and ABF, not so for EBF. CONCLUSIONS Despite the DQ deficiencies found in raw data, the DQ assurance approach by indicators proposed in this work, allowed us to obtain a robust estimation of indicators with a significant percentage of subjects with valid information for ABF and FBF monitoring. The estimations were consistent with results previously published. The methodology provided with this study allows a continuous and reliable population monitoring of infant feeding indicators of BFHI from routine EHR data.
Collapse
Affiliation(s)
- Ricardo García-de-León-Chocano
- Department of Information Technology, Hospital Virgen del Castillo, Yecla, Gerencia Área de Salud V - Altiplano, Servicio Murciano de Salud, Spain.
| | - Carlos Sáez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| | - Verónica Muñoz-Soler
- Doctoral student Nursing Department, Faculty of Health Sciences, University of Alicante, Spain
| | - Antonio Oliver-Roig
- Department of Nursing, Faculty of Health Sciences, University of Alicante, Spain
| | - Ricardo García-de-León-González
- Department of Paediatrics, Hospital Virgen del Castillo, Yecla, Gerencia Área de Salud V-Altiplano, Servicio Murciano de Salud, Spain
| | - Juan Miguel García-Gómez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| |
Collapse
|