1
|
Wagner MM, Hogan W, Levander J, Diller M. Towards Machine-FAIR: Representing software and datasets to facilitate reuse and scientific discovery by machines. J Biomed Inform 2024:104647. [PMID: 38692465 DOI: 10.1016/j.jbi.2024.104647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 04/16/2024] [Accepted: 04/28/2024] [Indexed: 05/03/2024]
Abstract
OBJECTIVE To use software, datasets, and data formats in the domain of Infectious Disease Epidemiology as a test collection to evaluate a novel M1 use case, which we introduce in this paper. M1 is a machine that upon receipt of a new digital object of research, exhaustively finds all valid compositions of it with existing objects. METHOD We implemented a data-format-matching-only M1 using exhaustive search, which we refer to as M1DFM. We then ran M1DFM on the test collection and used error analysis to identify needed semantic constraints. RESULTS Precision of M1DFM search was 61.7%. Error analysis identified needed semantic constraints and needed changes in handling of data services. Most semantic constraints were simple, but one data format was sufficiently complex to be practically impossible to represent semantic constraints over, from which we conclude limitatively that software developers will have to meet the machines halfway by engineering software whose inputs are sufficiently simple that their semantic constraints can be represented, akin to the simple APIs of services. We summarize these insights as M1-FAIR guiding principles for composability and suggest a roadmap for progressively capable devices in the service of reuse and accelerated scientific discovery. CONCLUSION Algorithmic search of digital repositories for valid workflow compositions has potential to accelerate scientific discovery but requires a scalable solution to the problem of knowledge acquisition about semantic constraints on software inputs. Additionally, practical limitations on the logical complexity of semantic constraints must be respected, which has implications for the design of software.
Collapse
Affiliation(s)
- Michael M Wagner
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA.
| | - William Hogan
- Data Science Institute, Medical College of Wisconsin, Milwaukee, WI, USA
| | - John Levander
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Matthew Diller
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
| |
Collapse
|
2
|
Aronis JM, Ferraro JP, Gesteland PH, Tsui F, Ye Y, Wagner MM, Cooper GF. A Bayesian approach for detecting a disease that is not being modeled. PLoS One 2020; 15:e0229658. [PMID: 32109254 PMCID: PMC7048291 DOI: 10.1371/journal.pone.0229658] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 02/12/2020] [Indexed: 11/19/2022] Open
Abstract
Over the past decade, outbreaks of new or reemergent viruses such as severe acute respiratory syndrome (SARS) virus, Middle East respiratory syndrome (MERS) virus, and Zika have claimed thousands of lives and cost governments and healthcare systems billions of dollars. Because the appearance of new or transformed diseases is likely to continue, the detection and characterization of emergent diseases is an important problem. We describe a Bayesian statistical model that can detect and characterize previously unknown and unmodeled diseases from patient-care reports and evaluate its performance on historical data.
Collapse
Affiliation(s)
- John M. Aronis
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Jeffrey P. Ferraro
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
| | - Per H. Gesteland
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
| | - Fuchiang Tsui
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Ye Ye
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Michael M. Wagner
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Gregory F. Cooper
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
3
|
Tsui F, Ye Y, Ruiz V, Cooper GF, Wagner MM. Automated influenza case detection for public health surveillance and clinical diagnosis using dynamic influenza prevalence method. J Public Health (Oxf) 2019; 40:878-885. [PMID: 29059331 DOI: 10.1093/pubmed/fdx141] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Indexed: 11/13/2022] Open
Abstract
Objectives To assess the performance of a Bayesian case detector (BCD) for influenza surveillance and clinical diagnosis. Methods BCD uses a Bayesian network classifier to compute the posterior probability of a patient having influenza based on 31 findings from narrative clinical notes. To assess the potential for disease surveillance, we calculated area under the receiver operating characteristic curve (AUC) to indicate BCD's ability to differentiate between influenza and non-influenza encounters in emergency department settings. To assess the potential for clinical diagnosis, we measured AUC for diagnosing influenza cases among encounters having influenza-like illnesses. We also evaluated the performance of BCD using dynamically estimated influenza prevalence, and measured sensitivity, specificity and positive predictive value. Results For influenza surveillance, BCD differentiated between influenza and non-influenza encounters well with an AUC of 0.90 and 0.97 with dynamic influenza prevalence (P < 0.0001). For clinical diagnosis, the addition of dynamic influenza prevalence to BCD significantly improved AUC from 0.63 to 0.85 to distinguish influenza from other causes of influenza-like illness. Conclusions and policy implications BCD can serve as an influenza surveillance and a differential diagnosis tool via our dynamic prevalence approach. It enhances the communication between public health and clinical practice.
Collapse
Affiliation(s)
- Fuchiang Tsui
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ye Ye
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Victor Ruiz
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Gregory F Cooper
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Michael M Wagner
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
4
|
Meuleman T, Baden N, Haasnoot GW, Wagner MM, Picavet C, Dekkers OM, Le Cessie S, van Lith JMM, Claas FHJ, Bloemenkamp KWM. Reply to: Responsibility of scientific community in claiming to have found an association with recurrent pregnancy loss. J Reprod Immunol 2019; 134-135:35. [PMID: 31324386 DOI: 10.1016/j.jri.2019.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2019] [Accepted: 07/03/2019] [Indexed: 10/26/2022]
Affiliation(s)
- Tess Meuleman
- Leiden University Medical Centre, Albinusdreef 2, 2300RC, Leiden, the Netherlands.
| | - N Baden
- Leiden University Medical Centre, Albinusdreef 2, 2300RC, Leiden, the Netherlands
| | - G W Haasnoot
- Leiden University Medical Centre, Albinusdreef 2, 2300RC, Leiden, the Netherlands
| | - M M Wagner
- Leiden University Medical Centre, Albinusdreef 2, 2300RC, Leiden, the Netherlands
| | - C Picavet
- Leiden University Medical Centre, Albinusdreef 2, 2300RC, Leiden, the Netherlands
| | - O M Dekkers
- Leiden University Medical Centre, Albinusdreef 2, 2300RC, Leiden, the Netherlands
| | - S Le Cessie
- Leiden University Medical Centre, Albinusdreef 2, 2300RC, Leiden, the Netherlands
| | - J M M van Lith
- Leiden University Medical Centre, Albinusdreef 2, 2300RC, Leiden, the Netherlands
| | - F H J Claas
- Leiden University Medical Centre, Albinusdreef 2, 2300RC, Leiden, the Netherlands
| | - K W M Bloemenkamp
- Leiden University Medical Centre, Albinusdreef 2, 2300RC, Leiden, the Netherlands
| |
Collapse
|
5
|
Meuleman T, Baden N, Haasnoot GW, Wagner MM, Dekkers OM, le Cessie S, Picavet C, van Lith JMM, Claas FHJ, Bloemenkamp KWM. Oral sex is associated with reduced incidence of recurrent miscarriage. J Reprod Immunol 2019; 133:1-6. [PMID: 30980918 DOI: 10.1016/j.jri.2019.03.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 03/07/2019] [Accepted: 03/25/2019] [Indexed: 12/13/2022]
Abstract
A possible way of immunomodulation of the maternal immune system before pregnancy would be exposure to paternal antigens via seminal fluid to oral mucosa. We hypothesized that women with recurrent miscarriage have had less oral sex compared to women with uneventful pregnancy. In a matched case control study, 97 women with at least three unexplained consecutive miscarriages prior to the 20th week of gestation with the same partner were included. Cases were younger than 36 years at time of the third miscarriage. The control group included 137 matched women with an uneventful pregnancy. The association between oral sex and recurrent miscarriage was assessed with conditional logistic regression, odds ratios (ORs) were estimated. Missing data were imputed using Imputation by Chained Equations. In the matched analysis, 41 out of 72 women with recurrent miscarriage had have oral sex, whereas 70 out of 96 matched controls answered positive to this question (56.9% vs. 72.9%, OR 0.50 95%CI 0.25-0.97, p = 0.04). After imputation of missing exposure data (51.7%), the association became weaker (OR 0.67, 95%CI 0.36-1.24, p = 0.21). In conclusion, this study suggests a possible protective role of oral sex in the occurrence of recurrent miscarriage in a proportion of the cases. Future studies in women with recurrent miscarriage explained by immune abnormalities should reveal whether oral exposure to seminal plasma indeed modifies the maternal immune system, resulting in more live births.
Collapse
Affiliation(s)
- T Meuleman
- Department of Obstetrics, Leiden University Medical Centre, Leiden, the Netherlands.
| | - N Baden
- Department of Obstetrics, Leiden University Medical Centre, Leiden, the Netherlands
| | - G W Haasnoot
- Department of Immunohematology and Blood transfusion, Leiden University Medical Centre, Leiden, the Netherlands
| | - M M Wagner
- Department of Obstetrics, Leiden University Medical Centre, Leiden, the Netherlands
| | - O M Dekkers
- Department of Clinical Epidemiology, Leiden University Medical Centre, Leiden, the Netherlands
| | - S le Cessie
- Department of Clinical Epidemiology, Leiden University Medical Centre, Leiden, the Netherlands; Medical Statistics, Department of Biomedical Datasciences, Leiden University Medical Centre, Leiden, the Netherlands
| | - C Picavet
- AllthatChas Research Consultancy, Amsterdam, the Netherlands
| | - J M M van Lith
- Department of Obstetrics, Leiden University Medical Centre, Leiden, the Netherlands
| | - F H J Claas
- Department of Immunohematology and Blood transfusion, Leiden University Medical Centre, Leiden, the Netherlands
| | - K W M Bloemenkamp
- Department of Obstetrics, Leiden University Medical Centre, Leiden, the Netherlands; Department of Obstetrics, Wilhelmina Children Hospital Birth Centre, Division Woman and Baby, University Medical Centre Utrecht, Utrecht, the Netherlands
| |
Collapse
|
6
|
Tajgardoon M, Wagner MM, Visweswara S, Zimmerman RK. A Novel Representation of Vaccine Efficacy Trial Datasets for Use in Computer Simulation of Vaccination Policy. AMIA Jt Summits Transl Sci Proc 2018; 2017:389-398. [PMID: 29888097 PMCID: PMC5961808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Computer simulation is the only method available for evaluating vaccination policy for rare diseases or emergency use of new vaccines. The most realistic simulation of vaccination policy is agent-based simulation (ABS) in which agents have similar socio-demographic characteristics to a population of interest. Currently, analysts use published information about vaccine efficacy (VE) as the probability that a vaccinated agent develops immunity; however, VE trials typically report only a single overall VE, or VE conditioned on one covariate (e.g., age). Thus, ABS's potential to realistically simulate the effects of co-existing diseases, gender, and other characteristics of a population is underused. We developed a Bayesian network (BN) model as a compact representation of a VE trial dataset for use in ABS of vaccination policy. We compared BN-based VEs to the VEs estimated directly from the dataset. Our evaluation results suggest that VE trials should release statistical models of their datasets for use in ABS of vaccination policy.
Collapse
Affiliation(s)
| | - Michael M Wagner
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shyam Visweswara
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Richard K Zimmerman
- Department of Family Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
7
|
Aronis JM, Millett NE, Wagner MM, Tsui F, Ye Y, Ferraro JP, Haug PJ, Gesteland PH, Cooper GF. A Bayesian system to detect and characterize overlapping outbreaks. J Biomed Inform 2017; 73:171-181. [PMID: 28797710 DOI: 10.1016/j.jbi.2017.08.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2016] [Revised: 07/04/2017] [Accepted: 08/04/2017] [Indexed: 10/19/2022]
Abstract
Outbreaks of infectious diseases such as influenza are a significant threat to human health. Because there are different strains of influenza which can cause independent outbreaks, and influenza can affect demographic groups at different rates and times, there is a need to recognize and characterize multiple outbreaks of influenza. This paper describes a Bayesian system that uses data from emergency department patient care reports to create epidemiological models of overlapping outbreaks of influenza. Clinical findings are extracted from patient care reports using natural language processing. These findings are analyzed by a case detection system to create disease likelihoods that are passed to a multiple outbreak detection system. We evaluated the system using real and simulated outbreaks. The results show that this approach can recognize and characterize overlapping outbreaks of influenza. We describe several extensions that appear promising.
Collapse
Affiliation(s)
- John M Aronis
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Nicholas E Millett
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Michael M Wagner
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Fuchiang Tsui
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ye Ye
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jeffrey P Ferraro
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Intermountain Healthcare, Salt Lake City, UT, USA
| | - Peter J Haug
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Intermountain Healthcare, Salt Lake City, UT, USA
| | - Per H Gesteland
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Intermountain Healthcare, Salt Lake City, UT, USA; Department of Pediatrics, University of Utah, Salt Lake City, UT, USA
| | - Gregory F Cooper
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
8
|
Ye Y, Wagner MM, Cooper GF, Ferraro JP, Su H, Gesteland PH, Haug PJ, Millett NE, Aronis JM, Nowalk AJ, Ruiz VM, López Pineda A, Shi L, Van Bree R, Ginter T, Tsui F. A study of the transferability of influenza case detection systems between two large healthcare systems. PLoS One 2017; 12:e0174970. [PMID: 28380048 PMCID: PMC5381795 DOI: 10.1371/journal.pone.0174970] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 03/17/2017] [Indexed: 01/16/2023] Open
Abstract
Objectives This study evaluates the accuracy and transferability of Bayesian case detection systems (BCD) that use clinical notes from emergency department (ED) to detect influenza cases. Methods A BCD uses natural language processing (NLP) to infer the presence or absence of clinical findings from ED notes, which are fed into a Bayesain network classifier (BN) to infer patients’ diagnoses. We developed BCDs at the University of Pittsburgh Medical Center (BCDUPMC) and Intermountain Healthcare in Utah (BCDIH). At each site, we manually built a rule-based NLP and trained a Bayesain network classifier from over 40,000 ED encounters between Jan. 2008 and May. 2010 using feature selection, machine learning, and expert debiasing approach. Transferability of a BCD in this study may be impacted by seven factors: development (source) institution, development parser, application (target) institution, application parser, NLP transfer, BN transfer, and classification task. We employed an ANOVA analysis to study their impacts on BCD performance. Results Both BCDs discriminated well between influenza and non-influenza on local test cases (AUCs > 0.92). When tested for transferability using the other institution’s cases, BCDUPMC discriminations declined minimally (AUC decreased from 0.95 to 0.94, p<0.01), and BCDIH discriminations declined more (from 0.93 to 0.87, p<0.0001). We attributed the BCDIH decline to the lower recall of the IH parser on UPMC notes. The ANOVA analysis showed five significant factors: development parser, application institution, application parser, BN transfer, and classification task. Conclusion We demonstrated high influenza case detection performance in two large healthcare systems in two geographically separated regions, providing evidentiary support for the use of automated case detection from routinely collected electronic clinical notes in national influenza surveillance. The transferability could be improved by training Bayesian network classifier locally and increasing the accuracy of the NLP parser.
Collapse
Affiliation(s)
- Ye Ye
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Michael M. Wagner
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Gregory F. Cooper
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Jeffrey P. Ferraro
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
- Intermountain Healthcare, Salt Lake City, Utah, United States of America
| | - Howard Su
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Per H. Gesteland
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
- Intermountain Healthcare, Salt Lake City, Utah, United States of America
- Department of Pediatrics, University of Utah, Salt Lake City, Utah, United States of America
| | - Peter J. Haug
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
- Intermountain Healthcare, Salt Lake City, Utah, United States of America
| | - Nicholas E. Millett
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - John M. Aronis
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Andrew J. Nowalk
- Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pennsylvania, United States of America
| | - Victor M. Ruiz
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Arturo López Pineda
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Lingyun Shi
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Rudy Van Bree
- Intermountain Healthcare, Salt Lake City, Utah, United States of America
| | - Thomas Ginter
- VA Salt Lake City Healthcare System, Salt Lake City, Utah, United States of America
| | - Fuchiang Tsui
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
9
|
Hogan WR, Wagner MM, Brochhausen M, Levander J, Brown ST, Millett N, DePasse J, Hanna J. The Apollo Structured Vocabulary: an OWL2 ontology of phenomena in infectious disease epidemiology and population biology for use in epidemic simulation. J Biomed Semantics 2016; 7:50. [PMID: 27538448 PMCID: PMC4989460 DOI: 10.1186/s13326-016-0092-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 08/10/2016] [Indexed: 01/03/2023] Open
Abstract
Background We developed the Apollo Structured Vocabulary (Apollo-SV)—an OWL2 ontology of phenomena in infectious disease epidemiology and population biology—as part of a project whose goal is to increase the use of epidemic simulators in public health practice. Apollo-SV defines a terminology for use in simulator configuration. Apollo-SV is the product of an ontological analysis of the domain of infectious disease epidemiology, with particular attention to the inputs and outputs of nine simulators. Results Apollo-SV contains 802 classes for representing the inputs and outputs of simulators, of which approximately half are new and half are imported from existing ontologies. The most important Apollo-SV class for users of simulators is infectious disease scenario, which is a representation of an ecosystem at simulator time zero that has at least one infection process (a class) affecting at least one population (also a class). Other important classes represent ecosystem elements (e.g., households), ecosystem processes (e.g., infection acquisition and infectious disease), censuses of ecosystem elements (e.g., censuses of populations), and infectious disease control measures. In the larger project, which created an end-user application that can send the same infectious disease scenario to multiple simulators, Apollo-SV serves as the controlled terminology and strongly influences the design of the message syntax used to represent an infectious disease scenario. As we added simulators for different pathogens (e.g., malaria and dengue), the core classes of Apollo-SV have remained stable, suggesting that our conceptualization of the information required by simulators is sound. Despite adhering to the OBO Foundry principle of orthogonality, we could not reuse Infectious Disease Ontology classes as the basis for infectious disease scenarios. We thus defined new classes in Apollo-SV for host, pathogen, infection, infectious disease, colonization, and infection acquisition. Unlike IDO, our ontological analysis extended to existing mathematical models of key biological phenomena studied by infectious disease epidemiology and population biology. Conclusion Our ontological analysis as expressed in Apollo-SV was instrumental in developing a simulator-independent representation of infectious disease scenarios that can be run on multiple epidemic simulators. Our experience suggests the importance of extending ontological analysis of a domain to include existing mathematical models of the phenomena studied by the domain. Apollo-SV is freely available at: http://purl.obolibrary.org/obo/apollo_sv.owl.
Collapse
Affiliation(s)
- William R Hogan
- University of Florida, P.O. Box 100219, 2004 Mowry Rd, Gainesville, FL, 32610-0219, USA.
| | - Michael M Wagner
- University of Pittsburgh, 5607 Baum Boulevard, Room 434, Pittsburgh, PA, 15206, USA
| | - Mathias Brochhausen
- University of Arkansas for Medical Sciences, 4301 W. Markham St. Slot #782, Little Rock, AR, 72205, USA
| | - John Levander
- University of Pittsburgh, 5607 Baum Boulevard, Room 434G, Pittsburgh, PA, 15206, USA
| | - Shawn T Brown
- Pittsburgh Supercomputing Center, 300 S. Craig St., Pittsburgh, PA, 15213, USA
| | - Nicholas Millett
- University of Pittsburgh, 5607 Baum Boulevard, Room 435 J, Pittsburgh, PA, 15206, USA
| | - Jay DePasse
- Pittsburgh Supercomputing Center, 300 S. Craig St., Pittsburgh, PA, 15213, USA
| | - Josh Hanna
- University of Florida, P.O. Box 100212, Gainesville, FL, 32610-0212, USA
| |
Collapse
|
10
|
López Pineda A, Ye Y, Visweswaran S, Cooper GF, Wagner MM, Tsui FR. Comparison of machine learning classifiers for influenza detection from emergency department free-text reports. J Biomed Inform 2015; 58:60-69. [PMID: 26385375 PMCID: PMC4684714 DOI: 10.1016/j.jbi.2015.08.019] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 05/28/2015] [Accepted: 08/21/2015] [Indexed: 12/31/2022]
Abstract
Influenza is a yearly recurrent disease that has the potential to become a pandemic. An effective biosurveillance system is required for early detection of the disease. In our previous studies, we have shown that electronic Emergency Department (ED) free-text reports can be of value to improve influenza detection in real time. This paper studies seven machine learning (ML) classifiers for influenza detection, compares their diagnostic capabilities against an expert-built influenza Bayesian classifier, and evaluates different ways of handling missing clinical information from the free-text reports. We identified 31,268 ED reports from 4 hospitals between 2008 and 2011 to form two different datasets: training (468 cases, 29,004 controls), and test (176 cases and 1620 controls). We employed Topaz, a natural language processing (NLP) tool, to extract influenza-related findings and to encode them into one of three values: Acute, Non-acute, and Missing. Results show that all ML classifiers had areas under ROCs (AUC) ranging from 0.88 to 0.93, and performed significantly better than the expert-built Bayesian model. Missing clinical information marked as a value of missing (not missing at random) had a consistently improved performance among 3 (out of 4) ML classifiers when it was compared with the configuration of not assigning a value of missing (missing completely at random). The case/control ratios did not affect the classification performance given the large number of training cases. Our study demonstrates ED reports in conjunction with the use of ML and NLP with the handling of missing value information have a great potential for the detection of infectious diseases.
Collapse
Affiliation(s)
- Arturo López Pineda
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States
| | - Ye Ye
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States
| | - Gregory F Cooper
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States
| | - Michael M Wagner
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States
| | - Fuchiang Rich Tsui
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States.
| |
Collapse
|
11
|
Cooper GF, Villamarin R, Rich Tsui FC, Millett N, Espino JU, Wagner MM. A method for detecting and characterizing outbreaks of infectious disease from clinical reports. J Biomed Inform 2014; 53:15-26. [PMID: 25181466 DOI: 10.1016/j.jbi.2014.08.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2014] [Revised: 08/04/2014] [Accepted: 08/22/2014] [Indexed: 11/30/2022]
Abstract
Outbreaks of infectious disease can pose a significant threat to human health. Thus, detecting and characterizing outbreaks quickly and accurately remains an important problem. This paper describes a Bayesian framework that links clinical diagnosis of individuals in a population to epidemiological modeling of disease outbreaks in the population. Computer-based diagnosis of individuals who seek healthcare is used to guide the search for epidemiological models of population disease that explain the pattern of diagnoses well. We applied this framework to develop a system that detects influenza outbreaks from emergency department (ED) reports. The system diagnoses influenza in individuals probabilistically from evidence in ED reports that are extracted using natural language processing. These diagnoses guide the search for epidemiological models of influenza that explain the pattern of diagnoses well. Those epidemiological models with a high posterior probability determine the most likely outbreaks of specific diseases; the models are also used to characterize properties of an outbreak, such as its expected peak day and estimated size. We evaluated the method using both simulated data and data from a real influenza outbreak. The results provide support that the approach can detect and characterize outbreaks early and well enough to be valuable. We describe several extensions to the approach that appear promising.
Collapse
Affiliation(s)
- Gregory F Cooper
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA.
| | - Ricardo Villamarin
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| | - Fu-Chiang Rich Tsui
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| | - Nicholas Millett
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| | - Jeremy U Espino
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| | - Michael M Wagner
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| |
Collapse
|
12
|
Wagner MM, Levander JD, Brown S, Hogan WR, Millett N, Hanna J. Apollo: giving application developers a single point of access to public health models using structured vocabularies and Web services. AMIA Annu Symp Proc 2013; 2013:1415-1424. [PMID: 24551417 PMCID: PMC3900155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This paper describes the Apollo Web Services and Apollo-SV, its related ontology. The Apollo Web Services give an end-user application a single point of access to multiple epidemic simulators. An end user can specify an analytic problem-which we define as a configuration and a query of results-exactly once and submit it to multiple epidemic simulators. The end user represents the analytic problem using a standard syntax and vocabulary, not the native languages of the simulators. We have demonstrated the feasibility of this design by implementing a set of Apollo services that provide access to two epidemic simulators and two visualizer services.
Collapse
Affiliation(s)
- Michael M Wagner
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
| | - John D Levander
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
| | - Shawn Brown
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, PA
| | - William R Hogan
- Division of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR
| | - Nicholas Millett
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
| | - Josh Hanna
- Division of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR
| |
Collapse
|
13
|
Wagner MM, van Dunné FM, Kuipers I, Thornton N, Folman CC, Ponjee GA, Oepkes D. Anti-Emm in a pregnant patient--case report. Vox Sang 2013; 106:385-6. [PMID: 24164348 DOI: 10.1111/vox.12104] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2013] [Revised: 09/25/2013] [Accepted: 09/25/2013] [Indexed: 12/01/2022]
Abstract
A 23-year-old primigravida of North African origin presented with a positive antibody screen at booking at 15 weeks of gestation. An antibody to a high-frequency antigen (HFA) of unknown identity was detected, which was reactive with the red blood cells of the father. This led to several challenges including antibody identification, clinical monitoring to detect signs of haemolytic disease of the foetus and newborn (HDFN) and compatible blood in case perinatal transfusion was needed. Anti-Emm was identified 2 months post-partum. This is the first published case which describes a pregnant patient with anti-Emm.
Collapse
Affiliation(s)
- M M Wagner
- Department of Gynaecology, Medical Center Haaglanden, The Hague, the Netherlands; Department of Obstetrics, Leiden University Medical Center, Leiden, the Netherlands
| | | | | | | | | | | | | |
Collapse
|
14
|
Porto L, Wagner MM, Heller C. MRT und MRA beim kindlichen Schlaganfall. ROFO-FORTSCHR RONTG 2013. [DOI: 10.1055/s-0033-1352542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
15
|
Lee BY, Tai JHY, McGlone SM, Bailey RR, Wateska AR, Zimmer SM, Zimmerman RK, Wagner MM. The potential economic value of a 'universal' (multi-year) influenza vaccine. Influenza Other Respir Viruses 2012; 6:167-75. [PMID: 21933357 PMCID: PMC3253949 DOI: 10.1111/j.1750-2659.2011.00288.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Limitations of the current annual influenza vaccine have led to ongoing efforts to develop a 'universal' influenza vaccine, i.e., one that targets a ubiquitous portion of the influenza virus so that the coverage of a single vaccination can persist for multiple years. OBJECTIVES To estimate the economic value of a 'universal' influenza vaccine compared to the standard annual influenza vaccine, starting vaccination in the pediatric population (2-18 year olds), over the course of their lifetime. PATIENT/METHODS Monte Carlo decision analytic computer simulation model. RESULTS Universal vaccine dominates (i.e., less costly and more effective) the annual vaccine when the universal vaccine cost ≤ $100/dose and efficacy ≥ 75% for both the 5- and 10-year duration. The universal vaccine is also dominant when efficacy is ≥ 50% and protects for 10 years. A $200 universal vaccine was only cost-effective when ≥ 75% efficacious for a 5-year duration when annual compliance was 25% and for a 10-year duration for all annual compliance rates. A universal vaccine is not cost-effective when it cost $200 and when its efficacy is ≤ 50%. The cost-effectiveness of the universal vaccine increases with the duration of protection. CONCLUSIONS Although development of a universal vaccine requires surmounting scientific hurdles, our results delineate the circumstances under which such a vaccine would be a cost-effective alternative to the annual influenza vaccine.
Collapse
Affiliation(s)
- Bruce Y Lee
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Lee BY, Stalter RM, Bacon KM, Tai JHY, Bailey RR, Zimmer SM, Wagner MM. Cost-effectiveness of adjuvanted versus nonadjuvanted influenza vaccine in adult hemodialysis patients. Am J Kidney Dis 2011; 57:724-32. [PMID: 21396760 DOI: 10.1053/j.ajkd.2010.12.016] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2010] [Accepted: 12/01/2010] [Indexed: 11/11/2022]
Abstract
BACKGROUND Currently more than 340,000 individuals are receiving long-term hemodialysis (HD) therapy for end-stage renal disease and therefore are particularly vulnerable to influenza, prone to more severe influenza outcomes, and less likely to achieve seroprotection from standard influenza vaccines. Influenza vaccine adjuvants, chemical or biologic compounds added to a vaccine to boost the elicited immunologic response, may help overcome this problem. STUDY DESIGN Economic stochastic decision analytic simulation model. SETTING & PARTICIPANTS US adult HD population. MODEL, PERSPECTIVE, & TIMEFRAME The model simulated the decision to use either an adjuvanted or nonadjuvanted vaccine, assumed the societal perspective, and represented a single influenza season, or 1 year. INTERVENTION Adjuvanted influenza vaccine at different adjuvant costs and efficacies. Sensitivity analyses explored the impact of varying influenza clinical attack rate, influenza hospitalization rate, and influenza-related mortality. OUTCOMES Incremental cost-effectiveness ratio of adjuvanted influenza vaccine (vs nonadjuvanted) with effectiveness measured in quality-adjusted life-years. RESULTS Adjuvanted influenza vaccine would be cost-effective (incremental cost-effectiveness ratio <$50,000/quality-adjusted life-year) at a $1 adjuvant cost (on top of the standard vaccine cost) when adjuvant efficacy (in overcoming the difference between influenza vaccine response in HD patients and healthy adults) ≥60% and economically dominant (provides both cost savings and health benefits) when the $1 adjuvant's efficacy is 100%. A $2 adjuvant would be cost-effective if adjuvant efficacy was 100%. LIMITATIONS All models are simplifications of real life and cannot capture all possible factors and outcomes. CONCLUSIONS Adjuvanted influenza vaccine with adjuvant cost ≤$2 could be a cost-effective strategy in a standard influenza season depending on the potency of the adjuvant.
Collapse
Affiliation(s)
- Bruce Y Lee
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | | | | | | | | | | | | |
Collapse
|
17
|
Jiang X, Wallstrom G, Cooper GF, Wagner MM. Bayesian prediction of an epidemic curve. J Biomed Inform 2008; 42:90-9. [PMID: 18593605 DOI: 10.1016/j.jbi.2008.05.013] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2008] [Revised: 05/23/2008] [Accepted: 05/30/2008] [Indexed: 11/17/2022]
Abstract
An epidemic curve is a graph in which the number of new cases of an outbreak disease is plotted against time. Epidemic curves are ordinarily constructed after the disease outbreak is over. However, a good estimate of the epidemic curve early in an outbreak would be invaluable to health care officials. Currently, techniques for predicting the severity of an outbreak are very limited. As far as predicting the number of future cases, ordinarily epidemiologists simply make an educated guess as to how many people might become affected. We develop a model for estimating an epidemic curve early in an outbreak, and we show results of experiments testing its accuracy.
Collapse
Affiliation(s)
- Xia Jiang
- Department of Biomedical Informatics, University of Pittsburgh, Parkvale Building, M-183, 200 Meyran Avenue, Pittsburgh, PA 15260, USA.
| | | | | | | |
Collapse
|
18
|
Wu TSJ, Shih FYF, Yen MY, Wu JSJ, Lu SW, Chang KCM, Hsiung C, Chou JH, Chu YT, Chang H, Chiu CH, Tsui FCR, Wagner MM, Su IJ, King CC. Establishing a nationwide emergency department-based syndromic surveillance system for better public health responses in Taiwan. BMC Public Health 2008; 8:18. [PMID: 18201388 PMCID: PMC2249581 DOI: 10.1186/1471-2458-8-18] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2007] [Accepted: 01/18/2008] [Indexed: 11/10/2022] Open
Abstract
Background With international concern over emerging infectious diseases (EID) and bioterrorist attacks, public health is being required to have early outbreak detection systems. A disease surveillance team was organized to establish a hospital emergency department-based syndromic surveillance system (ED-SSS) capable of automatically transmitting patient data electronically from the hospitals responsible for emergency care throughout the country to the Centers for Disease Control in Taiwan (Taiwan-CDC) starting March, 2004. This report describes the challenges and steps involved in developing ED-SSS and the timely information it provides to improve in public health decision-making. Methods Between June 2003 and March 2004, after comparing various surveillance systems used around the world and consulting with ED physicians, pediatricians and internal medicine physicians involved in infectious disease control, the Syndromic Surveillance Research Team in Taiwan worked with the Real-time Outbreak and Disease Surveillance (RODS) Laboratory at the University of Pittsburgh to create Taiwan's ED-SSS. The system was evaluated by analyzing daily electronic ED data received in real-time from the 189 hospitals participating in this system between April 1, 2004 and March 31, 2005. Results Taiwan's ED-SSS identified winter and summer spikes in two syndrome groups: influenza-like illnesses and respiratory syndrome illnesses, while total numbers of ED visits were significantly higher on weekends, national holidays and the days of Chinese lunar new year than weekdays (p < 0.001). It also identified increases in the upper, lower, and total gastrointestinal (GI) syndrome groups starting in November 2004 and two clear spikes in enterovirus-like infections coinciding with the two school semesters. Using ED-SSS for surveillance of influenza-like illnesses and enteroviruses-related infections has improved Taiwan's pandemic flu preparedness and disease control capabilities. Conclusion Taiwan's ED-SSS represents the first nationwide real-time syndromic surveillance system ever established in Asia. The experiences reported herein can encourage other countries to develop their own surveillance systems. The system can be adapted to other cultural and language environments for better global surveillance of infectious diseases and international collaboration.
Collapse
Affiliation(s)
- Tsung-Shu Joseph Wu
- Institute of Epidemiology, College of Public Health, National Taiwan University, Taipei City, Taiwan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Tsai MC, Tsui FC, Wagner MM. An evaluation of biosurveillance grid--dynamic algorithm distribution across multiple computer nodes. AMIA Annu Symp Proc 2007; 2007:746-750. [PMID: 18693936 PMCID: PMC2655926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/15/2007] [Revised: 07/20/2007] [Accepted: 10/11/2007] [Indexed: 05/26/2023]
Abstract
Performing fast data analysis to detect disease outbreaks plays a critical role in real-time biosurveillance. In this paper, we described and evaluated an Algorithm Distribution Manager Service (ADMS) based on grid technologies, which dynamically partition and distribute detection algorithms across multiple computers. We compared the execution time to perform the analysis on a single computer and on a grid network (3 computing nodes) with and without using dynamic algorithm distribution. We found that algorithms with long runtime completed approximately three times earlier in distributed environment than in a single computer while short runtime algorithms performed worse in distributed environment. A dynamic algorithm distribution approach also performed better than static algorithm distribution approach. This pilot study shows a great potential to reduce lengthy analysis time through dynamic algorithm partitioning and parallel processing, and provides the opportunity of distributing algorithms from a client to remote computers in a grid network.
Collapse
Affiliation(s)
- Ming-Chi Tsai
- RODS Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | | | | |
Collapse
|
20
|
Hogan WR, Cooper GF, Wallstrom GL, Wagner MM, Depinay JM. The Bayesian aerosol release detector: An algorithm for detecting and characterizing outbreaks caused by an atmospheric release ofBacillus anthracis. Stat Med 2007; 26:5225-52. [DOI: 10.1002/sim.3093] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
21
|
Brokopp C, Resultan E, Holmes H, Wagner MM. Laboratories. Handbook of Biosurveillance 2006. [PMCID: PMC7150189 DOI: 10.1016/b978-012369378-5/50010-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
|
22
|
Que J, Tsui FC, Wagner MM. Timeliness study of radiology and microbiology reports in a healthcare system for biosurveillance. AMIA Annu Symp Proc 2006; 2006:1068. [PMID: 17238687 PMCID: PMC1839281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
We developed a framework to measure the timeliness of two data types--radiology and microbiology reports--for detection of diseases such as inhalational anthrax (IA) in a healthcare system. We measured the timeliness of a data type as the delay between patient registration in an emergency department (ED) and receipt of data type by a biosurveillance system. We also determined the lower and upper bounds of median delay time (LMDT and UMDT) for the two data types to be available for detection of a single IA case. Based on the data received from the University of Pittsburgh Medical Center (UPMC) Health System, the LMDT time was 1.5 days and UMDT time was 6.4 days. The study provides a range of delay time for detection of a single IA case within a healthcare system, and it may benefit outbreak planning and outbreak model simulation.
Collapse
Affiliation(s)
- Jialan Que
- RODS Laboratory, Center of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | | | | |
Collapse
|
23
|
Wagner MM. Introduction. Handbook of Biosurveillance 2006. [PMCID: PMC7150169 DOI: 10.1016/b978-012369378-5/50003-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
24
|
Chapman WW, Dowling JN, Wagner MM. Generating a reliable reference standard set for syndromic case classification. J Am Med Inform Assoc 2005; 12:618-29. [PMID: 16049227 PMCID: PMC1294033 DOI: 10.1197/jamia.m1841] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2005] [Accepted: 06/07/2005] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE To generate and measure the reliability for a reference standard set with representative cases from seven broad syndromic case definitions and several narrower syndromic definitions used for biosurveillance. DESIGN From 527,228 eligible patients between 1990 and 2003, we generated a set of patients potentially positive for seven syndromes by classifying all eligible patients according to their ICD-9 primary discharge diagnoses. We selected a representative subset of the cases for chart review by physicians, who read emergency department reports and assigned values to 14 variables related to the seven syndromes. MEASUREMENTS (1) Positive predictive value of the ICD-9 diagnoses; (2) prevalence of the syndromic definitions and related variables; (3) agreement between physician raters demonstrated by kappa, kappa corrected for bias and prevalence, and Finn's r; and (4) reliability of the reference standard classifications demonstrated by generalizability coefficients. RESULTS Positive predictive value for ICD-9 classification ranged from 0.33 for botulinic to 0.86 for gastrointestinal. We generated between 80 and 566 positive cases for six of the seven syndromic definitions. Rash syndrome exhibited low prevalence (34 cases). Agreement between physician raters was high, with kappa > 0.70 for most variables. Ratings showed no bias. Finn's r was >0.70 for all variables. Generalizability coefficients were >0.70 for all variables but three. CONCLUSION Of the 27 syndromes generated by the 14 variables, 21 showed high enough prevalence, agreement, and reliability to be used as reference standard definitions against which an automated syndromic classifier could be compared. Syndromic definitions that showed poor agreement or low prevalence include febrile botulinic syndrome, febrile and nonfebrile rash syndrome, respiratory syndrome explained by a nonrespiratory or noninfectious diagnosis, and febrile and nonfebrile gastrointestinal syndrome explained by a nongastrointestinal or noninfectious diagnosis.
Collapse
Affiliation(s)
- Wendy W Chapman
- RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15213-2582, USA.
| | | | | |
Collapse
|
25
|
Chapman WW, Dowling JN, Wagner MM. Classification of emergency department chief complaints into 7 syndromes: a retrospective analysis of 527,228 patients. Ann Emerg Med 2005; 46:445-55. [PMID: 16271676 DOI: 10.1016/j.annemergmed.2005.04.012] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2004] [Revised: 03/04/2005] [Accepted: 04/14/2005] [Indexed: 10/25/2022]
Abstract
STUDY OBJECTIVE Electronic surveillance systems often monitor triage chief complaints in hopes of detecting an outbreak earlier than can be accomplished with traditional reporting methods. We measured the accuracy of a Bayesian chief complaint classifier called CoCo that assigns patients 1 of 7 syndromic categories (respiratory, botulinic, gastrointestinal, neurologic, rash, constitutional, or hemorrhagic) based on free-text triage chief complaints. METHODS We compared CoCo's classifications with criterion syndromic classification based on International Classification of Diseases, Ninth Revision (ICD-9) discharge diagnoses. We assigned the criterion classification to a patient based on whether the patient's primary diagnosis was a member of a set of ICD-9 codes associated with CoCo's 7 syndromes. We tested CoCo's performance on a set of 527,228 chief complaints from patients registered at the University of Pittsburgh Medical Center emergency department (ED) between 1990 and 2003. We performed a sensitivity analysis by varying the ICD-9 codes in the criterion standard. We also tested CoCo on chief complaints from EDs in a second location (Utah). RESULTS Approximately 16% (85,569/527,228) of the patients were classified according to the criterion standard into 1 of the 7 syndromes. CoCo's classification performance (number of cases by criterion standard, sensitivity [95% confidence interval (CI)], and specificity [95% CI]) was respiratory (34,916, 63.1 [62.6 to 63.6], 94.3 [94.3 to 94.4]); botulinic (1,961, 30.1 [28.2 to 32.2], 99.3 [99.3 to 99.3]); gastrointestinal (20,431, 69.0 [68.4 to 69.6], 95.6 [95.6 to 95.7]); neurologic (7,393, 67.6 [66.6 to 68.7], 92.7 [92.6 to 92.8]); rash (2,232, 46.8 [44.8 to 48.9], 99.3 [99.3 to 99.3]); constitutional (10,603, 45.8 [44.9 to 46.8], 96.6 [96.6 to 96.7]); and hemorrhagic (8,033, 75.2 [74.3 to 76.2], 98.5 [98.4 to 98.5]). The sensitivity analysis showed that the results were not affected by the choice of ICD-9 codes in the criterion standard. Classification accuracy did not differ on chief complaints from the second location. CONCLUSION Our results suggest that, for most syndromes, our chief complaint classification system can identify about half of the patients with relevant syndromic presentations, with specificities higher than 90% and positive predictive values ranging from 12% to 44%.
Collapse
Affiliation(s)
- Wendy W Chapman
- Center for Biomedical Informatics, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.
| | | | | |
Collapse
|
26
|
Chapman WW, Christensen LM, Wagner MM, Haug PJ, Ivanov O, Dowling JN, Olszewski RT. Classifying free-text triage chief complaints into syndromic categories with natural language processing. Artif Intell Med 2005; 33:31-40. [PMID: 15617980 DOI: 10.1016/j.artmed.2004.04.001] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2004] [Revised: 03/26/2004] [Accepted: 04/03/2004] [Indexed: 01/25/2023]
Abstract
OBJECTIVE Develop and evaluate a natural language processing application for classifying chief complaints into syndromic categories for syndromic surveillance. INTRODUCTION Much of the input data for artificial intelligence applications in the medical field are free-text patient medical records, including dictated medical reports and triage chief complaints. To be useful for automated systems, the free-text must be translated into encoded form. METHODS We implemented a biosurveillance detection system from Pennsylvania to monitor the 2002 Winter Olympic Games. Because input data was in free-text format, we used a natural language processing text classifier to automatically classify free-text triage chief complaints into syndromic categories used by the biosurveillance system. The classifier was trained on 4700 chief complaints from Pennsylvania. We evaluated the ability of the classifier to classify free-text chief complaints into syndromic categories with a test set of 800 chief complaints from Utah. RESULTS The classifier produced the following areas under the ROC curve: Constitutional = 0.95; Gastrointestinal = 0.97; Hemorrhagic = 0.99; Neurological = 0.96; Rash = 1.0; Respiratory = 0.99; Other = 0.96. Using information stored in the system's semantic model, we extracted from the Respiratory classifications lower respiratory complaints and lower respiratory complaints with fever with a precision of 0.97 and 0.96, respectively. CONCLUSION Results suggest that a trainable natural language processing text classifier can accurately extract data from free-text chief complaints for biosurveillance.
Collapse
Affiliation(s)
- Wendy W Chapman
- The RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, Suite 8084, Forbes Tower, Pittsburgh, PA 15213, USA.
| | | | | | | | | | | | | |
Collapse
|
27
|
Tsui FC, Espino JU, Weng Y, Choudary A, Su HD, Wagner MM. Key design elements of a data utility for national biosurveillance: event-driven architecture, caching, and Web service model. AMIA Annu Symp Proc 2005; 2005:739-43. [PMID: 16779138 PMCID: PMC1560630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
The National Retail Data Monitor (NRDM) has monitored over-the-counter (OTC) medication sales in the United States since December 2002. The NRDM collects data from over 18,600 retail stores and processes over 0.6 million sales records per day. This paper describes key architectural features that we have found necessary for a data utility component in a national biosurveillance system. These elements include event-driven architecture to provide analyses of data in near real time, multiple levels of caching to improve query response time, high availability through the use of clustered servers, scalable data storage through the use of storage area networks and a web-service function for interoperation with affiliated systems. The methods and architectural principles are relevant to the design of any production data utility for public health surveillance-systems that collect data from multiple sources in near real time for use by analytic programs and user interfaces that have substantial requirements for time-series data aggregated in multiple dimensions.
Collapse
Affiliation(s)
- Fu-Chiang Tsui
- RODS Laboratory, Center of Biomedical Informatics, University of Pittsburgh, PA 15219, USA
| | | | | | | | | | | |
Collapse
|
28
|
Wagner MM, Wallstrom GL, Onisko A. Issue a boil-water advisory or wait for definitive information? A decision analysis. AMIA Annu Symp Proc 2005; 2005:774-8. [PMID: 16779145 PMCID: PMC1560439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
OBJECTIVE Study the decision to issue a boil-water advisory in response to a spike in sales of diarrhea remedies or wait 72 hours for the results of definitive testing of water and people. METHODS Decision analysis. RESULTS In the base-case analysis, the optimal decision is test-and-wait. If the cost of issuing a boil-water advisory is less than 13.92 cents per person per day, the optimal decision is to issue the boil-water advisory immediately. CONCLUSIONS Decisions based on surveillance data that are suggestive but not conclusive about the existence of a disease outbreak can be modeled.
Collapse
Affiliation(s)
- Michael M Wagner
- RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, USA
| | | | | |
Collapse
|
29
|
Hogan WR, Wallstrom GL, Wagner MM. An evaluation of three policies for updating product categories in the National Retail Data Monitor. AMIA Annu Symp Proc 2005; 2005:325-9. [PMID: 16779055 PMCID: PMC1560721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
A problem in biosurveillance is how frequently to update controlled vocabularies that identify various data elements such as laboratory tests and over-the-counter healthcare products. More frequent updates improve completeness of data captured over time, but introduction of new codes into a surveillance system may cause false alarms when codes are aggregated into analytic categories. We studied the effect of three policies for updating UPCs, the controlled vocabulary for over-the-counter healthcare products used by the National Retail Data Monitor. To compare different policies for updating, we analyzed historical data from two cities for the 18 product categories of the National Retail Data Monitor under annual, quarterly, or monthly UPC update policies. We measured the effect on data completeness and false alarm rate. We found that the monthly update policy had the highest data completeness and led to the fewest number of additional false alarms. Overall, monthly updating of UPCs was the superior policy.
Collapse
|
30
|
Abstract
Automatic detection of cases of febrile illness may have potential for early detection of outbreaks of infectious disease either by identification of anomalous numbers of febrile illness or in concert with other information in diagnosing specific syndromes, such as febrile respiratory syndrome. At most institutions, febrile information is contained only in free-text clinical records. We compared the sensitivity and specificity of three fever detection algorithms for detecting fever from free-text. Keyword CC and CoCo classified patients based on triage chief complaints; Keyword HP classified patients based on dictated emergency department reports. Keyword HP was the most sensitive (sensitivity 0.98, specificity 0.89), and Keyword CC was the most specific (sensitivity 0.61, specificity 1.0). Because chief complaints are available sooner than emergency department reports, we suggest a combined application that classifies patients based on their chief complaint followed by classification based on their emergency department report, once the report becomes available.
Collapse
Affiliation(s)
- Wendy W Chapman
- RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
| | | | | |
Collapse
|
31
|
Wagner MM, Espino J, Tsui FC, Gesteland P, Chapman W, Ivanov O, Moore A, Wong W, Dowling J, Hutman J. Syndrome and outbreak detection using chief-complaint data--experience of the Real-Time Outbreak and Disease Surveillance project. MMWR Suppl 2004; 53:28-31. [PMID: 15714623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023] Open
Abstract
This paper summarizes the experience of the Real-Time Outbreak and Disease Surveillance (RODS) project in collecting and analyzing free-text emergency department (ED) chief complaints. The technical approach involves real-time transmission of chief-complaint data as Health Level 7 messages from hospitals to a regional data center, where a Bayesian text classifier assigns each chief complaint to one of eight syndrome categories. Time-series algorithms analyze the syndrome data and generate alerts. Authorized public health users review the syndrome data by using Internet interfaces with timelines and maps. Deployments in Pennsylvania, Utah, Atlantic City, and Ohio have demonstrated feasibility of real-time collection of chief complaints. Retrospective experiments that measured case-classification accuracy demonstrated that the Bayesian classifier can discriminate between different syndrome presentations. Retrospective experiments that measured outbreak-detection accuracy determined that the classifier's performance was adequate to support accurate and timely detection of seasonal disease outbreaks. Prospective evaluation revealed that a cluster of carbon monoxide exposures was detected by RODS within 4 hours of the presentation of the first case to an emergency department.
Collapse
Affiliation(s)
- Michael M Wagner
- Real-Time Outbreak and Disease Laboratory, University of Pittsburgh, Suite 500, Cellomics Building, 500 Technology Drive, Pittsburgh, PA 15219, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Wagner MM, Tsui FC, Espino J, Hogan W, Hutman J, Hersh J, Neill D, Moore A, Parks G, Lewis C, Aller R. National Retail Data Monitor for public health surveillance. MMWR Suppl 2004; 53:40-2. [PMID: 15714625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023] Open
Abstract
The National Retail Data Monitor (NRDM) is a public health surveillance tool that collects and analyzes daily sales data for over-the-counter (OTC) health-care products. NRDM collects sales data for selected OTC health-care products in near real time from >15,000 retail stores and makes them available to public health officials. NRDM is one of the first examples of a national data utility for public health surveillance that collects, redistributes, and analyzes daily sales-volume data of selected health-care products, thereby reducing the effort for both data providers and health departments.
Collapse
Affiliation(s)
- Michael M Wagner
- Real-Time Outbreak and Disease Surveillance Laboratory, University of Pittsburgh, Suite 500, Cellomics Building, 500 Technology Drive, Pittsburgh, PA 15219, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Abstract
To learn how outbreaks of infectious disease are detected and to describe the entities and information systems that together function to identify outbreaks in the U.S., the authors drew on multiple sources of information to create a description of existing surveillance systems and how they interact to detect outbreaks. The results of this analysis were summarized in a system diagram. The authors reviewed a sample of recent outbreaks to determine how they were detected, with reference to the system diagram. The de facto U.S. system for detection of outbreaks consists of five components: the clinical health care system, local/state health agencies, federal agencies, academic/professional organizations, and collaborating governmental organizations. Primary data collection occurs at the level of clinical health care systems and local health agencies. The review of a convenience sample of outbreaks showed that all five components of the system participated in aggregating, analyzing, and sharing data. The authors conclude that the current U.S. approach to detection of disease outbreaks is complex and involves many organizations interacting in a loosely coupled manner. State and local health departments and the health care system are major components in the detection of outbreaks.
Collapse
Affiliation(s)
- Virginia Dato
- Pennsylvania Department of Health, Southwest District Office, 514 Pittsburgh State Building, 300 liberty Avenue, Pittsburgh, PA 15222, USA.
| | | | | |
Collapse
|
34
|
Abstract
A large number of biological agents can cause natural or bioterroristic disease outbreaks and each can present in a bewildering number of ways (e.g., a few cases versus many cases, confined to a building versus widely disseminated). This 'problem space' is a challenge for designers of early warning systems for disease outbreaks and the sheer size of this space is a barrier to progress. This paper addresses this problem by deriving nine categories of threats that represent a parsimonious characterization of the problem space. A literature search also identified one or more example outbreaks for each of the nine categories. These outbreaks have occurred in recent times and could be used by researchers in need of actual outbreak data for investigations of the role of different types of surveillance data and algorithms in outbreak detection. The methodological contribution of this research is a Criterion Set of threats for analysis and evaluation of detection systems. This set characterizes the problem space in a tractable manner with less loss of generality than analyses based on one or two selected diseases, which is representative of current analyses.
Collapse
Affiliation(s)
- Michael M Wagner
- The Real-Time Outbreak and Disease Surveillance Laboratory, Center for Biomedical Informatics, University of Pittsburgh, Suite 550, 100 Technology Drive, Pittsburgh, PA 15219, USA.
| | | | | | | |
Collapse
|
35
|
Mandl KD, Overhage JM, Wagner MM, Lober WB, Sebastiani P, Mostashari F, Pavlin JA, Gesteland PH, Treadwell T, Koski E, Hutwagner L, Buckeridge DL, Aller RD, Grannis S. Implementing syndromic surveillance: a practical guide informed by the early experience. J Am Med Inform Assoc 2004; 11:141-50. [PMID: 14633933 PMCID: PMC353021 DOI: 10.1197/jamia.m1356] [Citation(s) in RCA: 234] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2003] [Accepted: 09/28/2003] [Indexed: 01/04/2023] Open
Abstract
Syndromic surveillance refers to methods relying on detection of individual and population health indicators that are discernible before confirmed diagnoses are made. In particular, prior to the laboratory confirmation of an infectious disease, ill persons may exhibit behavioral patterns, symptoms, signs, or laboratory findings that can be tracked through a variety of data sources. Syndromic surveillance systems are being developed locally, regionally, and nationally. The efforts have been largely directed at facilitating the early detection of a covert bioterrorist attack, but the technology may also be useful for general public health, clinical medicine, quality improvement, patient safety, and research. This report, authored by developers and methodologists involved in the design and deployment of the first wave of syndromic surveillance systems, is intended to serve as a guide for informaticians, public health managers, and practitioners who are currently planning deployment of such systems in their regions.
Collapse
Affiliation(s)
- Kenneth D Mandl
- Children's Hospital Informatics Program, Division of Emergency Medicine, Center for Biopreparedness, Children's Hospital Boston, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Espino JU, Wagner MM, Tsui FC, Su HD, Olszewski RT, Lie Z, Chapman W, Zeng X, Ma L, Lu ZW, Dara J. The RODS Open Source Project: removing a barrier to syndromic surveillance. Stud Health Technol Inform 2004; 107:1192-6. [PMID: 15361001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
The goal of the Real-time Outbreak and Disease Surveillance (RODS) Open Source Project is to accelerate deployment of computer-based syndromic surveillance. To this end, the project has released the RODS software under the GNU General Public License and created an organizational structure to catalyze its development. This paper describes the design of the software, requested extensions, and the structure of the development effort.
Collapse
Affiliation(s)
- Jeremy U Espino
- Real-time Outbreak and Disease Surveillance Laboratory, University of Pittsburgh, PA 15219, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Johnson HA, Wagner MM, Hogan WR, Chapman W, Olszewski RT, Dowling J, Barnas G. Analysis of Web access logs for surveillance of influenza. Stud Health Technol Inform 2004; 107:1202-6. [PMID: 15361003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
The purpose of this study was to determine whether the level of influenza in a population correlates with the number of times that internet users access information about influenza on health-related Web sites. We obtained Web access logs from the Healthlink Web site. Web access logs contain information about the user and the information the user accessed, and are maintained electronically by most Web sites, including Healthlink. We developed weekly counts of the number of accesses of selected influenza-related articles on the Healthlink Web site and measured their correlation with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) using the cross-correlation function (CCF). We defined timeliness as the time lag at which the correlation was a maximum. There was a moderately strong correlation between the frequency of influenza-related article accesses and the CDC's traditional surveillance data, but the results on timeliness were inconclusive. With improvements in methods for performing spatial analysis of the data and the continuing increase in Web searching behavior among Americans, Web article access has the potential to become a useful data source for public health early warning systems.
Collapse
Affiliation(s)
- Heather A Johnson
- RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, PA 15219, USA.
| | | | | | | | | | | | | |
Collapse
|
38
|
Gesteland PH, Gardner RM, Tsui FC, Espino JU, Rolfs RT, James BC, Chapman WW, Moore AW, Wagner MM. Automated syndromic surveillance for the 2002 Winter Olympics. J Am Med Inform Assoc 2003; 10:547-54. [PMID: 12925547 PMCID: PMC264432 DOI: 10.1197/jamia.m1352] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2003] [Accepted: 05/14/2003] [Indexed: 11/10/2022] Open
Abstract
The 2002 Olympic Winter Games were held in Utah from February 8 to March 16, 2002. Following the terrorist attacks on September 11, 2001, and the anthrax release in October 2001, the need for bioterrorism surveillance during the Games was paramount. A team of informaticists and public health specialists from Utah and Pittsburgh implemented the Real-time Outbreak and Disease Surveillance (RODS) system in Utah for the Games in just seven weeks. The strategies and challenges of implementing such a system in such a short time are discussed. The motivation and cooperation inspired by the 2002 Olympic Winter Games were a powerful driver in overcoming the organizational issues. Over 114,000 acute care encounters were monitored between February 8 and March 31, 2002. No outbreaks of public health significance were detected. The system was implemented successfully and operational for the 2002 Olympic Winter Games and remains operational today.
Collapse
|
39
|
Wagner MM, Robinson JM, Tsui FC, Espino JU, Hogan WR. Design of a national retail data monitor for public health surveillance. J Am Med Inform Assoc 2003; 10:409-18. [PMID: 12807802 PMCID: PMC212777 DOI: 10.1197/jamia.m1357] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2003] [Accepted: 05/13/2003] [Indexed: 01/04/2023] Open
Abstract
The National Retail Data Monitor receives data daily from 10,000 stores, including pharmacies, that sell health care products. These stores belong to national chains that process sales data centrally and utilize Universal Product Codes and scanners to collect sales information at the cash register. The high degree of retail sales data automation enables the monitor to collect information from thousands of store locations in near to real time for use in public health surveillance. The monitor provides user interfaces that display summary sales data on timelines and maps. Algorithms monitor the data automatically on a daily basis to detect unusual patterns of sales. The project provides the resulting data and analyses, free of charge, to health departments nationwide. Future plans include continued enrollment and support of health departments, developing methods to make the service financially self-supporting, and further refinement of the data collection system to reduce the time latency of data receipt and analysis.
Collapse
Affiliation(s)
- Michael M Wagner
- The RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, Suite 550, 100 Technology Drive, Pittsburgh, PA 15219, USA.
| | | | | | | | | |
Collapse
|
40
|
Chapman WW, Cooper GF, Hanbury P, Chapman BE, Harrison LH, Wagner MM. Creating a text classifier to detect radiology reports describing mediastinal findings associated with inhalational anthrax and other disorders. J Am Med Inform Assoc 2003; 10:494-503. [PMID: 12807805 PMCID: PMC212787 DOI: 10.1197/jamia.m1330] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2003] [Accepted: 05/13/2003] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE The aim of this study was to create a classifier for automatic detection of chest radiograph reports consistent with the mediastinal findings of inhalational anthrax. DESIGN The authors used the Identify Patient Sets (IPS) system to create a key word classifier for detecting reports describing mediastinal findings consistent with anthrax and compared their performances on a test set of 79,032 chest radiograph reports. MEASUREMENTS Area under the ROC curve was the main outcome measure of the IPS classifier. Sensitivity and specificity of an initial IPS model were calculated based on an existing key word search and were compared against a Boolean version of the IPS classifier. RESULTS The IPS classifier received an area under the ROC curve of 0.677 (90% CI = 0.628 to 0.772) with a specificity of 0.99 and maximum sensitivity of 0.35. The initial IPS model attained a specificity of 1.0 and a sensitivity of 0.04. CONCLUSION The IPS system is a useful tool for helping domain experts create a statistical key word classifier for textual reports that is a potentially useful component in surveillance of radiographic findings suspicious for anthrax.
Collapse
Affiliation(s)
- Wendy Webber Chapman
- Center for Biomedical Informatics, University of Pittsburgh, Suite 8084 Forbes Tower, Pittsburgh, PA 15213, USA.
| | | | | | | | | | | |
Collapse
|
41
|
Tsui FC, Espino JU, Dato VM, Gesteland PH, Hutman J, Wagner MM. Technical description of RODS: a real-time public health surveillance system. J Am Med Inform Assoc 2003; 10:399-408. [PMID: 12807803 PMCID: PMC212776 DOI: 10.1197/jamia.m1345] [Citation(s) in RCA: 177] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2003] [Accepted: 05/13/2003] [Indexed: 11/10/2022] Open
Abstract
This report describes the design and implementation of the Real-time Outbreak and Disease Surveillance (RODS) system, a computer-based public health surveillance system for early detection of disease outbreaks. Hospitals send RODS data from clinical encounters over virtual private networks and leased lines using the Health Level 7 (HL7) message protocol. The data are sent in real time. RODS automatically classifies the registration chief complaint from the visit into one of seven syndrome categories using Bayesian classifiers. It stores the data in a relational database, aggregates the data for analysis using data warehousing techniques, applies univariate and multivariate statistical detection algorithms to the data, and alerts users of when the algorithms identify anomalous patterns in the syndrome counts. RODS also has a Web-based user interface that supports temporal and spatial analyses. RODS processes sales of over-the-counter health care products in a similar manner but receives such data in batch mode on a daily basis. RODS was used during the 2002 Winter Olympics and currently operates in two states-Pennsylvania and Utah. It has been and continues to be a resource for implementing, evaluating, and applying new methods of public health surveillance.
Collapse
Affiliation(s)
- Fu-Chiang Tsui
- The RODS Laboratory, Center for Biomedical Informatics, Suite 8084 Forbes Tower, 200 Lothrop Street, Pittsburgh, PA 15261, USA.
| | | | | | | | | | | |
Collapse
|
42
|
Espino JU, Hogan WR, Wagner MM. Telephone triage: a timely data source for surveillance of influenza-like diseases. AMIA Annu Symp Proc 2003; 2003:215-9. [PMID: 14728165 PMCID: PMC1480215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/28/2023]
Abstract
We evaluated telephone triage (TT) data for public health early warning systems. TT data is electronically available and contains coded elements that include the demographics and description of a caller's medical complaints. In the study, we obtained emergency room TT data and after hours TT data from a commercial TT software and service company. We compared the timeliness of the TT data with influenza surveillance data from the Centers for Disease Control using the cross correlation function. Emergency room TT calls are one to five weeks ahead of surveillance data collected by the CDC.
Collapse
Affiliation(s)
- Jeremy U Espino
- RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, Pennsylvania, USA
| | | | | |
Collapse
|
43
|
Ma L, Tsui FC, Hogan WR, Wagner MM, Ma H. A framework for infection control surveillance using association rules. AMIA Annu Symp Proc 2003; 2003:410-4. [PMID: 14728205 PMCID: PMC1480000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/28/2023]
Abstract
Surveillance of antibiotic resistance and nosocomial infections is one of the most important functions of a hospital infection control program. We employed the association rule method for automatically identifying new, unexpected, and potentially interesting patterns in hospital infection control. We hypothesized that mining for low-support, low-confidence rules would detect unexpected outbreaks caused by a small number of cases. To build a framework, we preprocessed the data and added new templates to eliminate uninteresting patterns. We applied our method to the culture data collected over 3 months from 10 hospitals in the UPMC Health System. We found that the new process and system are efficient and effective in identifying new, unexpected, and potentially interesting patterns in surveillance data. The clinical relevance and utility of this process await the results of prospective studies.
Collapse
Affiliation(s)
- Lili Ma
- Center of Biomedical Informatics, University of Pittsburgh, PA, USA
| | | | | | | | | |
Collapse
|
44
|
Zhang J, Tsui FC, Wagner MM, Hogan WR. Detection of outbreaks from time series data using wavelet transform. AMIA Annu Symp Proc 2003; 2003:748-52. [PMID: 14728273 PMCID: PMC1479935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/28/2023]
Abstract
In this paper, we developed a new approach to detection of disease outbreaks based on wavelet transform. It is capable of dealing with two problems found in real-world time series data, namely, negative singularity and long-term trends, which may degrade the performance of current approaches to outbreak detection. To test this approach, we introduced artificail disease outbreaks and negative singularities into a real world dataset and applied it and two other algorithms-autoregressive (AR) and Multi-resolution Wavelet Auto-regressive (MWAR) - to this dataset. We compared the performance of these algorithms in terms of sensitivity, specificity and timeliness. The results showed that our approach had similar sensitivity and specificity and slightly better timeliness compared to the other two algorithms. When we introduced negative singularities, its performance did not degrade as much as the other two algorithms' performance. We conclude that our approach to detection, when compared to traditional approaches, may not be as susceptible to degradation of performance caused by negative singularities.
Collapse
Affiliation(s)
- Jun Zhang
- Center of Biomedical Informatics, University of Pittsburgh, PA 15260, USA
| | | | | | | |
Collapse
|
45
|
Ivanov O, Gesteland PH, Hogan W, Mundorff MB, Wagner MM. Detection of pediatric respiratory and gastrointestinal outbreaks from free-text chief complaints. AMIA Annu Symp Proc 2003; 2003:318-22. [PMID: 14728186 PMCID: PMC1480317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/28/2023]
Abstract
We conducted a retrospective study to ascertain the potential of free-text chief complaints collected in pediatric emergency departments to serve as surveillance data for early detection of outbreaks. We determined that automatically coded chief complaint data provide a signal that reflects outbreaks in a population of children less than five years of age. Using the Exponentially Weighted Moving Average (EWMA) detection algorithm, we measured the timeliness, sensitivity, and specificity of free-text chief complaints for predicting outbreaks of pediatric respiratory and gastrointestinal illness. We found that time series of automatically coded free text-chief complaints in pediatric patients correlate well with hospital admissions and precede them by the mean of 10.3 days (95% CI -15.15, 35.5) for respiratory outbreaks and 29 days (95% CI 4.23, 53.7) for gastrointestinal outbreaks. We conclude that free-text chief complaints may play an important role as an early, sensitive and specific indicator of outbreaks of respiratory and gastrointestinal illness in children less than five years of age.
Collapse
Affiliation(s)
- Oleg Ivanov
- Center for Biomedical Informatics, University of Pittsburgh, PA and University of Utah and Intermountain HealthCare, Salt Lake City, UT, USA
| | | | | | | | | |
Collapse
|
46
|
Hogan WR, Tsui FC, Ivanov O, Gesteland PH, Grannis S, Overhage JM, Robinson JM, Wagner MM. Detection of pediatric respiratory and diarrheal outbreaks from sales of over-the-counter electrolyte products. J Am Med Inform Assoc 2003; 10:555-62. [PMID: 12925542 PMCID: PMC264433 DOI: 10.1197/jamia.m1377] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE To determine whether sales of electrolyte products contain a signal of outbreaks of respiratory and diarrheal disease in children and, if so, how much earlier a signal relative to hospital diagnoses. DESIGN Retrospective analysis was conducted of sales of electrolyte products and hospital diagnoses for six urban regions in three states for the period 1998 through 2001. MEASUREMENTS Presence of signal was ascertained by measuring correlation between electrolyte sales and hospital diagnoses and the temporal relationship that maximized correlation. Earliness was the difference between the date that the exponentially weighted moving average (EWMA) method first detected an outbreak from sales and the date it first detected the outbreak from diagnoses. The coefficient of determination (r2) measured how much variance in earliness resulted from differences in sales' and diagnoses' signal strengths. RESULTS The correlation between electrolyte sales and hospital diagnoses was 0.90 (95% CI, 0.87-0.93) at a time offset of 1.7 weeks (95% CI, 0.50-2.9), meaning that sales preceded diagnoses by 1.7 weeks. EWMA with a nine-sigma threshold detected the 18 outbreaks on average 2.4 weeks (95% CI, 0.1-4.8 weeks) earlier from sales than from diagnoses. Twelve outbreaks were first detected from sales, four were first detected from diagnoses, and two were detected simultaneously. Only 26% of variance in earliness was explained by the relative strength of the sales and diagnoses signals (r2 = 0.26). CONCLUSION Sales of electrolyte products contain a signal of outbreaks of respiratory and diarrheal diseases in children and usually are an earlier signal than hospital diagnoses.
Collapse
Affiliation(s)
- William R Hogan
- The RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, Pennsylvania, USA.
| | | | | | | | | | | | | | | |
Collapse
|
47
|
Panackal AA, Tsui FC, McMahon J, Wagner MM, Dixon BW, Zubieta J, Phelan M, Mirza S, Morgan J, Jernigan D, Pasculle AW, Rankin JT, Hajjeh RA, Harrison LH. Automatic electronic laboratory-based reporting of notifiable infectious diseases at a large health system. Emerg Infect Dis 2002; 8:685-91. [PMID: 12095435 PMCID: PMC2730325 DOI: 10.3201/eid0807.010493] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Electronic laboratory-based reporting, developed by the UPMC Health System, Pittsburgh, Pennsylvania, was evaluated to determine if it could be integrated into the conventional paper-based reporting system. We reviewed reports of 10 infectious diseases from 8 UPMC hospitals that reported to the Allegheny County Health Department in southwestern Pennsylvania during January 1-November 26, 2000. Electronic reports were received a median of 4 days earlier than conventional reports. The completeness of reporting was 74% (95% confidence interval [CI] 66% to 81%) for the electronic laboratory-based reporting and 65% (95% CI 57% to 73%) for the conventional paper-based reporting system (p>0.05). Most reports (88%) missed by electronic laboratory-based reporting were caused by using free text. Automatic reporting was more rapid and as complete as conventional reporting. Using standardized coding and minimizing free text usage will increase the completeness of electronic laboratory-based reporting.
Collapse
Affiliation(s)
- Anil A. Panackal
- Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | | | | | - Joan McMahon
- Allegheny County Health Department, Pittsburgh, Pennsylvania, USA
| | | | - Bruce W. Dixon
- Allegheny County Health Department, Pittsburgh, Pennsylvania, USA
| | - Juan Zubieta
- Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Maureen Phelan
- Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Sara Mirza
- Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Juliette Morgan
- Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Daniel Jernigan
- Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | | | - James T. Rankin
- Pennsylvania Department of Health, Harrisburg, Pennsylvania, USA
| | - Rana A. Hajjeh
- Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | | |
Collapse
|
48
|
Lober WB, Karras BT, Wagner MM, Overhage JM, Davidson AJ, Fraser H, Trigg LJ, Mandl KD, Espino JU, Tsui FC. Roundtable on bioterrorism detection: information system-based surveillance. J Am Med Inform Assoc 2002; 9:105-15. [PMID: 11861622 PMCID: PMC344564 DOI: 10.1197/jamia.m1052] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2001] [Accepted: 11/21/2001] [Indexed: 11/10/2022] Open
Abstract
During the 2001 AMIA Annual Symposium, the Anesthesia, Critical Care, and Emergency Medicine Working Group hosted the Roundtable on Bioterrorism Detection. Sixty-four people attended the roundtable discussion, during which several researchers discussed public health surveillance systems designed to enhance early detection of bioterrorism events. These systems make secondary use of existing clinical, laboratory, paramedical, and pharmacy data or facilitate electronic case reporting by clinicians. This paper combines case reports of six existing systems with discussion of some common techniques and approaches. The purpose of the roundtable discussion was to foster communication among researchers and promote progress by 1) sharing information about systems, including origins, current capabilities, stages of deployment, and architectures; 2) sharing lessons learned during the development and implementation of systems; and 3) exploring cooperation projects, including the sharing of software and data. A mailing list server for these ongoing efforts may be found at http://bt.cirg.washington.edu.
Collapse
|
49
|
Tsui FC, Espino JU, Wagner MM, Gesteland P, Ivanov O, Olszewski RT, Liu Z, Zeng X, Chapman W, Wong WK, Moore A. Data, network, and application: technical description of the Utah RODS Winter Olympic Biosurveillance System. Proc AMIA Symp 2002:815-9. [PMID: 12463938 PMCID: PMC2244477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2023] Open
Abstract
Given the post September 11th climate of possible bioterrorist attacks and the high profile 2002 Winter Olympics in the Salt Lake City, Utah, we challenged ourselves to deploy a computer-based real-time automated biosurveillance system for Utah, the Utah Real-time Outbreak and Disease Surveillance system (Utah RODS), in six weeks using our existing Real-time Outbreak and Disease Surveillance (RODS) architecture. During the Olympics, Utah RODS received real-time HL-7 admission messages from 10 emergency departments and 20 walk-in clinics. It collected free-text chief complaints, categorized them into one of seven prodromes classes using natural language processing, and provided a web interface for real-time display of time series graphs, geographic information system output, outbreak algorithm alerts, and details of the cases. The system detected two possible outbreaks that were dismissed as the natural result of increasing rates of Influenza. Utah RODS allowed us to further understand the complexities underlying the rapid deployment of a RODS-like system.
Collapse
Affiliation(s)
- Fu-Chiang Tsui
- Center for Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Abstract
The events that followed the launch of Sputnik on Oct 4, 1957, provide a metaphor for the events that are following the first bioterroristic case of pulmonary anthrax in the United States. This paper uses that metaphor to elucidate the nature of the task ahead and to suggest questions such as, Can the goals of the biodefense effort be formulated as concisely and concretely as the goal of the space program? Can we measure success in biodefense as we did for the space project? What are the existing resources that are the equivalents of propulsion systems and rocket engineers that can be applied to the problems of biodefense?
Collapse
Affiliation(s)
- Michael M Wagner
- RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, USA.
| |
Collapse
|