1
|
Aronis JM, Ye Y, Espino J, Hochheiser H, Michaels MG, Cooper GF. A Bayesian System to Detect and Track Outbreaks of Influenza-Like Illnesses Including Novel Diseases. JMIR Public Health Surveill 2024. [PMID: 38805611 DOI: 10.2196/57349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2024] Open
Abstract
BACKGROUND The early identification of outbreaks of both known and novel influenza-like illnesses is an important public health problem. OBJECTIVE The design and testing of a tool that detects and tracks outbreaks of both known and novel influenza-like illness, such as the SARS-CoV-19 worldwide pandemic, accurately and early. METHODS This paper describes the ILI Tracker algorithm that first models the daily occurrence of a set of known influenza-like illnesses in hospital emergency departments in a monitored region using findings extracted from patient care reports using natural language processing. We then show how the algorithm can be extended to detect and track the presence of an unmodeled disease which may represent a novel disease outbreak. RESULTS We include results based on modeling the diseases influenza, respiratory syncytial virus, human metapneumovirus, and parainfluenza for five emergency departments in Allegheny County Pennsylvania from June 1, 2014 through May 31, 2015. We also include the results of detecting the outbreak of an unmodeled disease, which in retrospect was very likely an outbreak of the enterovirus EV-D68. CONCLUSIONS The results reported in this paper provide support that ILI Tracker was able to track well the incidence of four modeled influenza-like diseases over a one-year period, relative to laboratory confirmed cases, and it was computationally efficient in doing so. The system was alsoable to detect a likely novel outbreak of the enterovirus D68 early in an outbreak that occurred in Allegheny County in 2014, as well as clinically characterize that outbreak disease accurately. CLINICALTRIAL
Collapse
Affiliation(s)
- John Michael Aronis
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd, Suite 500, Pittsburgh, US
| | - Ye Ye
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd, Suite 500, Pittsburgh, US
| | - Jessi Espino
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd, Suite 500, Pittsburgh, US
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd, Suite 500, Pittsburgh, US
| | - Marian G Michaels
- Department of Pediatrics, University of Pittsburgh School of Medicine, UPMC Children's Hospital of Pittsburgh, Pittsburgh, US
| | - Gregory F Cooper
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd, Suite 500, Pittsburgh, US
| |
Collapse
|
2
|
Hung SK, Wu CC, Singh A, Li JH, Lee C, Chou EH, Pekosz A, Rothman R, Chen KF. Developing and validating clinical features-based machine learning algorithms to predict influenza infection in influenza-like illness patients. Biomed J 2023; 46:100561. [PMID: 36150651 PMCID: PMC10498408 DOI: 10.1016/j.bj.2022.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 09/05/2022] [Accepted: 09/16/2022] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Seasonal influenza poses a significant risk, and patients can benefit from early diagnosis and treatment. However, underdiagnosis and undertreatment remain widespread. We developed and compared clinical feature-based machine learning (ML) algorithms that can accurately predict influenza infection in emergency departments (EDs) among patients with influenza-like illness (ILI). MATERIAL AND METHODS We conducted a prospective cohort study in five EDs in the US and Taiwan from 2015 to 2020. Adult patients visiting the EDs with symptoms of ILI were recruited and tested by real-time RT-PCR for influenza. We evaluated seven ML algorithms and compared their results with previously developed clinical prediction models. RESULTS Out of the 2189 enrolled patients, 1104 tested positive for influenza. The eXtreme Gradient Boosting achieved superior performance with an area under the receiver operating characteristic curve of 0.82 (95% confidence interval [CI] = 0.79-0.85), with a sensitivity of 0.92 (95% CI = 0.88-0.95), specificity of 0.89 (95% CI = 0.86-0.92), and accuracy of 0.72 (95% CI = 0.69-0.76) in the testing set over cut-offs of 0.4, 0.6 and 0.5, respectively. These results were superior to those of previously proposed clinical prediction models. The model interpretation revealed that body temperature, cough, rhinorrhea, and exposure history were positively associated with and the days of illness and influenza vaccine were negatively associated with influenza infection. We also found the week of the influenza season, pulse rate, and oxygen saturation to be associated with influenza infection. CONCLUSIONS The clinical feature-based ML model outperformed conventional models for predicting influenza infection.
Collapse
Affiliation(s)
- Shang-Kai Hung
- Department of Emergency Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan
| | - Chin-Chieh Wu
- Clinical Informatics and Medical Statistics Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Avichandra Singh
- Clinical Informatics and Medical Statistics Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Jin-Hua Li
- Clinical Informatics and Medical Statistics Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Christian Lee
- Department of Emergency Medicine, Baylor Scott and White All Saints Medical Center, Fort Worth, TX, USA
| | - Eric H Chou
- Department of Emergency Medicine, Baylor Scott and White All Saints Medical Center, Fort Worth, TX, USA
| | - Andrew Pekosz
- W. Harry Feinstone Department of Molecular Microbiology and Immunology, The Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Richard Rothman
- Department of Emergency Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kuan-Fu Chen
- Department of Emergency Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan; Clinical Informatics and Medical Statistics Research Center, Chang Gung University, Taoyuan, Taiwan; Department of Emergency Medicine, Chang Gung Memorial Hospital at Keelung, Keelung, Taiwan.
| |
Collapse
|
3
|
Aronis JM, Ye Y, Espino J, Hochheiser H, Michaels MG, Cooper GF. A Bayesian System to Track Outbreaks of Influenza-Like Illnesses Including Novel Diseases. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.10.23289799. [PMID: 37293033 PMCID: PMC10246032 DOI: 10.1101/2023.05.10.23289799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
It would be highly desirable to have a tool that detects the outbreak of a new influenza-like illness, such as COVID-19, accurately and early. This paper describes the ILI Tracker algorithm that first models the daily occurrence of a set of known influenza-like illnesses in a hospital emergency department using findings extracted from patient-care reports using natural language processing. We include results based on modeling the diseases influenza, respiratory syncytial virus, human metapneumovirus, and parainfluenza for five emergency departments in Allegheny County Pennsylvania from June 1, 2010 through May 31, 2015. We then show how the algorithm can be extended to detect the presence of an unmodeled disease which may represent a novel disease outbreak. We also include results for detecting an outbreak of an unmodeled disease during the mentioned time period, which in retrospect was very likely an outbreak of Enterovirus D68.
Collapse
|
4
|
Kulik T, Brankovics B, van Diepeningen AD, Bilska K, Żelechowski M, Myszczyński K, Molcan T, Stakheev A, Stenglein S, Beyer M, Pasquali M, Sawicki J, Wyrȩbek J, Baturo-Cieśniewska A. Diversity of Mobile Genetic Elements in the Mitogenomes of Closely Related Fusarium culmorum and F. graminearum sensu stricto Strains and Its Implication for Diagnostic Purposes. Front Microbiol 2020; 11:1002. [PMID: 32528440 PMCID: PMC7263005 DOI: 10.3389/fmicb.2020.01002] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 04/24/2020] [Indexed: 12/19/2022] Open
Abstract
Much of the mitogenome variation observed in fungal lineages seems driven by mobile genetic elements (MGEs), which have invaded their genomes throughout evolution. The variation in the distribution and nucleotide diversity of these elements appears to be the main distinction between different fungal taxa, making them promising candidates for diagnostic purposes. Fungi of the genus Fusarium display a high variation in MGE content, from MGE-poor (Fusarium oxysporum and Fusarium fujikuroi species complex) to MGE-rich mitogenomes found in the important cereal pathogens F. culmorum and F. graminearum sensu stricto. In this study, we investigated the MGE variation in these latter two species by mitogenome analysis of geographically diverse strains. In addition, a smaller set of F. cerealis and F. pseudograminearum strains was included for comparison. Forty-seven introns harboring from 0 to 3 endonucleases (HEGs) were identified in the standard set of mitochondrial protein-coding genes. Most of them belonged to the group I intron family and harbored either LAGLIDADG or GIY-YIG HEGs. Among a total of 53 HEGs, 27 were shared by all fungal strains. Most of the optional HEGs were irregularly distributed among fungal strains/species indicating ancestral mosaicism in MGEs. However, among optional MGEs, one exhibited species-specific conservation in F. culmorum. While in F. graminearum s.s. MGE patterns in cox3 and in the intergenic spacer between cox2 and nad4L may facilitate the identification of this species. Thus, our results demonstrate distinctive traits of mitogenomes for diagnostic purposes of Fusaria.
Collapse
Affiliation(s)
- Tomasz Kulik
- Department of Botany and Nature Protection, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Balazs Brankovics
- Biointeractions & Plant Health, Wageningen Plant Research, Wageningen, Netherlands
| | | | - Katarzyna Bilska
- Department of Botany and Nature Protection, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Maciej Żelechowski
- Department of Botany and Nature Protection, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Kamil Myszczyński
- Department of Botany and Nature Protection, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland.,Molecular Biology Laboratory, Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, Poland
| | - Tomasz Molcan
- Department of Animal Anatomy and Physiology, Faculty of Biology and Biotechnology, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Alexander Stakheev
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
| | - Sebastian Stenglein
- National Scientific and Technical Research Council, Godoy Cruz, Argentina.,Universidad Nacional del Centro de la Provincia de Buenos Aires, Tandil, Argentina
| | - Marco Beyer
- Department of Environmental Research and Innovation, Agro-Environmental Systems, Luxembourg Institute of Science and Technology, Belval, Luxembourg
| | - Matias Pasquali
- Department of Food, Environmental and Nutritional Sciences, University of Milan, Milan, Italy
| | - Jakub Sawicki
- Department of Botany and Nature Protection, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Joanna Wyrȩbek
- Department of Botany and Nature Protection, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Anna Baturo-Cieśniewska
- Laboratory of Phytopathology and Molecular Mycology, Department of Biology and Plant Protection, UTP University of Science and Technology, Bydgoszcz, Poland
| |
Collapse
|
5
|
Tsui F, Ye Y, Ruiz V, Cooper GF, Wagner MM. Automated influenza case detection for public health surveillance and clinical diagnosis using dynamic influenza prevalence method. J Public Health (Oxf) 2019; 40:878-885. [PMID: 29059331 DOI: 10.1093/pubmed/fdx141] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Indexed: 11/13/2022] Open
Abstract
Objectives To assess the performance of a Bayesian case detector (BCD) for influenza surveillance and clinical diagnosis. Methods BCD uses a Bayesian network classifier to compute the posterior probability of a patient having influenza based on 31 findings from narrative clinical notes. To assess the potential for disease surveillance, we calculated area under the receiver operating characteristic curve (AUC) to indicate BCD's ability to differentiate between influenza and non-influenza encounters in emergency department settings. To assess the potential for clinical diagnosis, we measured AUC for diagnosing influenza cases among encounters having influenza-like illnesses. We also evaluated the performance of BCD using dynamically estimated influenza prevalence, and measured sensitivity, specificity and positive predictive value. Results For influenza surveillance, BCD differentiated between influenza and non-influenza encounters well with an AUC of 0.90 and 0.97 with dynamic influenza prevalence (P < 0.0001). For clinical diagnosis, the addition of dynamic influenza prevalence to BCD significantly improved AUC from 0.63 to 0.85 to distinguish influenza from other causes of influenza-like illness. Conclusions and policy implications BCD can serve as an influenza surveillance and a differential diagnosis tool via our dynamic prevalence approach. It enhances the communication between public health and clinical practice.
Collapse
Affiliation(s)
- Fuchiang Tsui
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ye Ye
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Victor Ruiz
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Gregory F Cooper
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Michael M Wagner
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
6
|
The design and evaluation of a Bayesian system for detecting and characterizing outbreaks of influenza. Online J Public Health Inform 2019; 11:e6. [PMID: 31632600 DOI: 10.5210/ojphi.v11i2.9952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The prediction and characterization of outbreaks of infectious diseases such as influenza remains an open and important problem. This paper describes a framework for detecting and characterizing outbreaks of influenza and the results of testing it on data from ten outbreaks collected from two locations over five years. We model outbreaks with compartment models and explicitly model non-influenza influenza-like illnesses.
Collapse
|
7
|
Aronis JM, Millett NE, Wagner MM, Tsui F, Ye Y, Ferraro JP, Haug PJ, Gesteland PH, Cooper GF. A Bayesian system to detect and characterize overlapping outbreaks. J Biomed Inform 2017; 73:171-181. [PMID: 28797710 PMCID: PMC5604259 DOI: 10.1016/j.jbi.2017.08.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2016] [Revised: 07/04/2017] [Accepted: 08/04/2017] [Indexed: 10/19/2022]
Abstract
Outbreaks of infectious diseases such as influenza are a significant threat to human health. Because there are different strains of influenza which can cause independent outbreaks, and influenza can affect demographic groups at different rates and times, there is a need to recognize and characterize multiple outbreaks of influenza. This paper describes a Bayesian system that uses data from emergency department patient care reports to create epidemiological models of overlapping outbreaks of influenza. Clinical findings are extracted from patient care reports using natural language processing. These findings are analyzed by a case detection system to create disease likelihoods that are passed to a multiple outbreak detection system. We evaluated the system using real and simulated outbreaks. The results show that this approach can recognize and characterize overlapping outbreaks of influenza. We describe several extensions that appear promising.
Collapse
Affiliation(s)
- John M Aronis
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Nicholas E Millett
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Michael M Wagner
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Fuchiang Tsui
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ye Ye
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jeffrey P Ferraro
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Intermountain Healthcare, Salt Lake City, UT, USA
| | - Peter J Haug
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Intermountain Healthcare, Salt Lake City, UT, USA
| | - Per H Gesteland
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Intermountain Healthcare, Salt Lake City, UT, USA; Department of Pediatrics, University of Utah, Salt Lake City, UT, USA
| | - Gregory F Cooper
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
8
|
Afzal N, Sohn S, Scott CG, Liu H, Kullo IJ, Arruda-Olson AM. Surveillance of Peripheral Arterial Disease Cases Using Natural Language Processing of Clinical Notes. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017; 2017:28-36. [PMID: 28815100 PMCID: PMC5543345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Peripheral arterial disease (PAD) is a chronic disease that affects millions of people worldwide and yet remains underdiagnosed and undertreated. Early detection is important, because PAD is strongly associated with an increased risk of mortality and morbidity. In this study, we built a PAD surveillance system using natural language processing (NLP) for early detection of PAD from narrative clinical notes. Our NLP algorithm had excellent positive predictive value (0.93) and identified 41% of PAD cases before the initial ankle-brachial index (ABI) test date while in 12% of cases the NLP algorithm detected PAD on the same date as the ABI (the gold standard for comparison). Hence, our system ascertains PAD patients in a timely and accurate manner. In conclusion, our PAD surveillance NLP algorithm has the potential for translation to clinical practice for use in reminding clinicians to order ABI tests in patients with suspected PAD and to reinforce the implementation of guideline recommended risk modification strategies in patients diagnosed with PAD.
Collapse
Affiliation(s)
- Naveed Afzal
- Department of Health Sciences Research, Rochester MN
| | - Sunghwan Sohn
- Department of Health Sciences Research, Rochester MN
| | | | - Hongfang Liu
- Department of Health Sciences Research, Rochester MN
| | - Iftikhar J Kullo
- Department of Cardiovascular Diseases, Mayo Clinic, Rochester MN
| | | |
Collapse
|
9
|
Ferraro JP, Ye Y, Gesteland PH, Haug PJ, Tsui FR, Cooper GF, Van Bree R, Ginter T, Nowalk AJ, Wagner M. The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance. Appl Clin Inform 2017; 8:560-580. [PMID: 28561130 PMCID: PMC6241736 DOI: 10.4338/aci-2016-12-ra-0211] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 03/11/2017] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVES This study evaluates the accuracy and portability of a natural language processing (NLP) tool for extracting clinical findings of influenza from clinical notes across two large healthcare systems. Effectiveness is evaluated on how well NLP supports downstream influenza case-detection for disease surveillance. METHODS We independently developed two NLP parsers, one at Intermountain Healthcare (IH) in Utah and the other at University of Pittsburgh Medical Center (UPMC) using local clinical notes from emergency department (ED) encounters of influenza. We measured NLP parser performance for the presence and absence of 70 clinical findings indicative of influenza. We then developed Bayesian network models from NLP processed reports and tested their ability to discriminate among cases of (1) influenza, (2) non-influenza influenza-like illness (NI-ILI), and (3) 'other' diagnosis. RESULTS On Intermountain Healthcare reports, recall and precision of the IH NLP parser were 0.71 and 0.75, respectively, and UPMC NLP parser, 0.67 and 0.79. On University of Pittsburgh Medical Center reports, recall and precision of the UPMC NLP parser were 0.73 and 0.80, respectively, and IH NLP parser, 0.53 and 0.80. Bayesian case-detection performance measured by AUROC for influenza versus non-influenza on Intermountain Healthcare cases was 0.93 (using IH NLP parser) and 0.93 (using UPMC NLP parser). Case-detection on University of Pittsburgh Medical Center cases was 0.95 (using UPMC NLP parser) and 0.83 (using IH NLP parser). For influenza versus NI-ILI on Intermountain Healthcare cases performance was 0.70 (using IH NLP parser) and 0.76 (using UPMC NLP parser). On University of Pisstburgh Medical Center cases, 0.76 (using UPMC NLP parser) and 0.65 (using IH NLP parser). CONCLUSION In all but one instance (influenza versus NI-ILI using IH cases), local parsers were more effective at supporting case-detection although performances of non-local parsers were reasonable.
Collapse
Affiliation(s)
- Jeffrey P Ferraro
- Jeffrey P. Ferraro, Homer Warner Center | Intermountain Healthcare, 5171 South Cottonwood St, Suite 220, Murray, Utah 84107, , Tel: 801-244-6570
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Ye Y, Wagner MM, Cooper GF, Ferraro JP, Su H, Gesteland PH, Haug PJ, Millett NE, Aronis JM, Nowalk AJ, Ruiz VM, López Pineda A, Shi L, Van Bree R, Ginter T, Tsui F. A study of the transferability of influenza case detection systems between two large healthcare systems. PLoS One 2017; 12:e0174970. [PMID: 28380048 PMCID: PMC5381795 DOI: 10.1371/journal.pone.0174970] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 03/17/2017] [Indexed: 01/16/2023] Open
Abstract
Objectives This study evaluates the accuracy and transferability of Bayesian case detection systems (BCD) that use clinical notes from emergency department (ED) to detect influenza cases. Methods A BCD uses natural language processing (NLP) to infer the presence or absence of clinical findings from ED notes, which are fed into a Bayesain network classifier (BN) to infer patients’ diagnoses. We developed BCDs at the University of Pittsburgh Medical Center (BCDUPMC) and Intermountain Healthcare in Utah (BCDIH). At each site, we manually built a rule-based NLP and trained a Bayesain network classifier from over 40,000 ED encounters between Jan. 2008 and May. 2010 using feature selection, machine learning, and expert debiasing approach. Transferability of a BCD in this study may be impacted by seven factors: development (source) institution, development parser, application (target) institution, application parser, NLP transfer, BN transfer, and classification task. We employed an ANOVA analysis to study their impacts on BCD performance. Results Both BCDs discriminated well between influenza and non-influenza on local test cases (AUCs > 0.92). When tested for transferability using the other institution’s cases, BCDUPMC discriminations declined minimally (AUC decreased from 0.95 to 0.94, p<0.01), and BCDIH discriminations declined more (from 0.93 to 0.87, p<0.0001). We attributed the BCDIH decline to the lower recall of the IH parser on UPMC notes. The ANOVA analysis showed five significant factors: development parser, application institution, application parser, BN transfer, and classification task. Conclusion We demonstrated high influenza case detection performance in two large healthcare systems in two geographically separated regions, providing evidentiary support for the use of automated case detection from routinely collected electronic clinical notes in national influenza surveillance. The transferability could be improved by training Bayesian network classifier locally and increasing the accuracy of the NLP parser.
Collapse
Affiliation(s)
- Ye Ye
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Michael M. Wagner
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Gregory F. Cooper
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Jeffrey P. Ferraro
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
- Intermountain Healthcare, Salt Lake City, Utah, United States of America
| | - Howard Su
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Per H. Gesteland
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
- Intermountain Healthcare, Salt Lake City, Utah, United States of America
- Department of Pediatrics, University of Utah, Salt Lake City, Utah, United States of America
| | - Peter J. Haug
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
- Intermountain Healthcare, Salt Lake City, Utah, United States of America
| | - Nicholas E. Millett
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - John M. Aronis
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Andrew J. Nowalk
- Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pennsylvania, United States of America
| | - Victor M. Ruiz
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Arturo López Pineda
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Lingyun Shi
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Rudy Van Bree
- Intermountain Healthcare, Salt Lake City, Utah, United States of America
| | - Thomas Ginter
- VA Salt Lake City Healthcare System, Salt Lake City, Utah, United States of America
| | - Fuchiang Tsui
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
11
|
Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc 2016; 23:1007-15. [PMID: 26911811 PMCID: PMC4997034 DOI: 10.1093/jamia/ocv180] [Citation(s) in RCA: 205] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Revised: 10/13/2015] [Accepted: 10/26/2015] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. METHODS A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. RESULTS Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). CONCLUSIONS Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall).
Collapse
Affiliation(s)
- Elizabeth Ford
- Division of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, UK
| | - John A Carroll
- Department of Informatics, University of Sussex, Brighton, UK
| | - Helen E Smith
- Division of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, UK
| | - Donia Scott
- Department of Informatics, University of Sussex, Brighton, UK
| | - Jackie A Cassell
- Division of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, UK
| |
Collapse
|
12
|
López Pineda A, Ye Y, Visweswaran S, Cooper GF, Wagner MM, Tsui FR. Comparison of machine learning classifiers for influenza detection from emergency department free-text reports. J Biomed Inform 2015; 58:60-69. [PMID: 26385375 PMCID: PMC4684714 DOI: 10.1016/j.jbi.2015.08.019] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 05/28/2015] [Accepted: 08/21/2015] [Indexed: 12/31/2022]
Abstract
Influenza is a yearly recurrent disease that has the potential to become a pandemic. An effective biosurveillance system is required for early detection of the disease. In our previous studies, we have shown that electronic Emergency Department (ED) free-text reports can be of value to improve influenza detection in real time. This paper studies seven machine learning (ML) classifiers for influenza detection, compares their diagnostic capabilities against an expert-built influenza Bayesian classifier, and evaluates different ways of handling missing clinical information from the free-text reports. We identified 31,268 ED reports from 4 hospitals between 2008 and 2011 to form two different datasets: training (468 cases, 29,004 controls), and test (176 cases and 1620 controls). We employed Topaz, a natural language processing (NLP) tool, to extract influenza-related findings and to encode them into one of three values: Acute, Non-acute, and Missing. Results show that all ML classifiers had areas under ROCs (AUC) ranging from 0.88 to 0.93, and performed significantly better than the expert-built Bayesian model. Missing clinical information marked as a value of missing (not missing at random) had a consistently improved performance among 3 (out of 4) ML classifiers when it was compared with the configuration of not assigning a value of missing (missing completely at random). The case/control ratios did not affect the classification performance given the large number of training cases. Our study demonstrates ED reports in conjunction with the use of ML and NLP with the handling of missing value information have a great potential for the detection of infectious diseases.
Collapse
Affiliation(s)
- Arturo López Pineda
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States
| | - Ye Ye
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States
| | - Gregory F Cooper
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States
| | - Michael M Wagner
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States
| | - Fuchiang Rich Tsui
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States.
| |
Collapse
|
13
|
Ye Y, Tsui F(R, Wagner M, Espino JU, Li Q. Influenza detection from emergency department reports using natural language processing and Bayesian network classifiers. J Am Med Inform Assoc 2014; 21:815-23. [PMID: 24406261 PMCID: PMC4147621 DOI: 10.1136/amiajnl-2013-001934] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Revised: 09/25/2013] [Accepted: 12/11/2013] [Indexed: 01/29/2023] Open
Abstract
OBJECTIVES To evaluate factors affecting performance of influenza detection, including accuracy of natural language processing (NLP), discriminative ability of Bayesian network (BN) classifiers, and feature selection. METHODS We derived a testing dataset of 124 influenza patients and 87 non-influenza (shigellosis) patients. To assess NLP finding-extraction performance, we measured the overall accuracy, recall, and precision of Topaz and MedLEE parsers for 31 influenza-related findings against a reference standard established by three physician reviewers. To elucidate the relative contribution of NLP and BN classifier to classification performance, we compared the discriminative ability of nine combinations of finding-extraction methods (expert, Topaz, and MedLEE) and classifiers (one human-parameterized BN and two machine-parameterized BNs). To assess the effects of feature selection, we conducted secondary analyses of discriminative ability using the most influential findings defined by their likelihood ratios. RESULTS The overall accuracy of Topaz was significantly better than MedLEE (with post-processing) (0.78 vs 0.71, p<0.0001). Classifiers using human-annotated findings were superior to classifiers using Topaz/MedLEE-extracted findings (average area under the receiver operating characteristic (AUROC): 0.75 vs 0.68, p=0.0113), and machine-parameterized classifiers were superior to the human-parameterized classifier (average AUROC: 0.73 vs 0.66, p=0.0059). The classifiers using the 17 'most influential' findings were more accurate than classifiers using all 31 subject-matter expert-identified findings (average AUROC: 0.76>0.70, p<0.05). CONCLUSIONS Using a three-component evaluation method we demonstrated how one could elucidate the relative contributions of components under an integrated framework. To improve classification performance, this study encourages researchers to improve NLP accuracy, use a machine-parameterized classifier, and apply feature selection methods.
Collapse
Affiliation(s)
- Ye Ye
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Fuchiang (Rich) Tsui
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Michael Wagner
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Jeremy U Espino
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Qi Li
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| |
Collapse
|
14
|
Cooper GF, Villamarin R, Rich Tsui FC, Millett N, Espino JU, Wagner MM. A method for detecting and characterizing outbreaks of infectious disease from clinical reports. J Biomed Inform 2014; 53:15-26. [PMID: 25181466 DOI: 10.1016/j.jbi.2014.08.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2014] [Revised: 08/04/2014] [Accepted: 08/22/2014] [Indexed: 11/30/2022]
Abstract
Outbreaks of infectious disease can pose a significant threat to human health. Thus, detecting and characterizing outbreaks quickly and accurately remains an important problem. This paper describes a Bayesian framework that links clinical diagnosis of individuals in a population to epidemiological modeling of disease outbreaks in the population. Computer-based diagnosis of individuals who seek healthcare is used to guide the search for epidemiological models of population disease that explain the pattern of diagnoses well. We applied this framework to develop a system that detects influenza outbreaks from emergency department (ED) reports. The system diagnoses influenza in individuals probabilistically from evidence in ED reports that are extracted using natural language processing. These diagnoses guide the search for epidemiological models of influenza that explain the pattern of diagnoses well. Those epidemiological models with a high posterior probability determine the most likely outbreaks of specific diseases; the models are also used to characterize properties of an outbreak, such as its expected peak day and estimated size. We evaluated the method using both simulated data and data from a real influenza outbreak. The results provide support that the approach can detect and characterize outbreaks early and well enough to be valuable. We describe several extensions to the approach that appear promising.
Collapse
Affiliation(s)
- Gregory F Cooper
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA.
| | - Ricardo Villamarin
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| | - Fu-Chiang Rich Tsui
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| | - Nicholas Millett
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| | - Jeremy U Espino
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| | - Michael M Wagner
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| |
Collapse
|