1
|
Ren H, Ling Y, Cao R, Wang Z, Li Y, Huang T. Early warning of emerging infectious diseases based on multimodal data. BIOSAFETY AND HEALTH 2023; 5:S2590-0536(23)00074-5. [PMID: 37362865 PMCID: PMC10245235 DOI: 10.1016/j.bsheal.2023.05.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 05/18/2023] [Accepted: 05/31/2023] [Indexed: 06/28/2023] Open
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has dramatically increased the awareness of emerging infectious diseases. The advancement of multiomics analysis technology has resulted in the development of several databases containing virus information. Several scientists have integrated existing data on viruses to construct phylogenetic trees and predict virus mutation and transmission in different ways, providing prospective technical support for epidemic prevention and control. This review summarized the databases of known emerging infectious viruses and techniques focusing on virus variant forecasting and early warning. It focuses on the multi-dimensional information integration and database construction of emerging infectious viruses, virus mutation spectrum construction and variant forecast model, analysis of the affinity between mutation antigen and the receptor, propagation model of virus dynamic evolution, and monitoring and early warning for variants. As people have suffered from COVID-19 and repeated flu outbreaks, we focused on the research results of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and influenza viruses. This review comprehensively viewed the latest virus research and provided a reference for future virus prevention and control research.
Collapse
Affiliation(s)
- Haotian Ren
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yunchao Ling
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Ruifang Cao
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Zhen Wang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yixue Li
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- Guangzhou Laboratory, Guangzhou 510005, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai 200433, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
2
|
Poirier C, Bouzillé G, Bertaud V, Cuggia M, Santillana M, Lavenu A. Gastroenteritis Forecasting Assessing the Use of Web and Electronic Health Record Data With a Linear and a Nonlinear Approach: Comparison Study. JMIR Public Health Surveill 2023; 9:e34982. [PMID: 36719726 PMCID: PMC9929730 DOI: 10.2196/34982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 07/19/2022] [Accepted: 11/28/2022] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Disease surveillance systems capable of producing accurate real-time and short-term forecasts can help public health officials design timely public health interventions to mitigate the effects of disease outbreaks in affected populations. In France, existing clinic-based disease surveillance systems produce gastroenteritis activity information that lags real time by 1 to 3 weeks. This temporal data gap prevents public health officials from having a timely epidemiological characterization of this disease at any point in time and thus leads to the design of interventions that do not take into consideration the most recent changes in dynamics. OBJECTIVE The goal of this study was to evaluate the feasibility of using internet search query trends and electronic health records to predict acute gastroenteritis (AG) incidence rates in near real time, at the national and regional scales, and for long-term forecasts (up to 10 weeks). METHODS We present 2 different approaches (linear and nonlinear) that produce real-time estimates, short-term forecasts, and long-term forecasts of AG activity at 2 different spatial scales in France (national and regional). Both approaches leverage disparate data sources that include disease-related internet search activity, electronic health record data, and historical disease activity. RESULTS Our results suggest that all data sources contribute to improving gastroenteritis surveillance for long-term forecasts with the prominent predictive power of historical data owing to the strong seasonal dynamics of this disease. CONCLUSIONS The methods we developed could help reduce the impact of the AG peak by making it possible to anticipate increased activity by up to 10 weeks.
Collapse
Affiliation(s)
- Canelle Poirier
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Institut national de la santé et de la recherche médicale U1099, Rennes, France
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France
- Centre de Données Cliniques, Centre Hospitalier Universitaire Rennes, Rennes, France
| | - Guillaume Bouzillé
- Institut national de la santé et de la recherche médicale U1099, Rennes, France
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France
- Centre de Données Cliniques, Centre Hospitalier Universitaire Rennes, Rennes, France
| | - Valérie Bertaud
- Institut national de la santé et de la recherche médicale U1099, Rennes, France
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France
- Centre de Données Cliniques, Centre Hospitalier Universitaire Rennes, Rennes, France
| | - Marc Cuggia
- Institut national de la santé et de la recherche médicale U1099, Rennes, France
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France
- Centre de Données Cliniques, Centre Hospitalier Universitaire Rennes, Rennes, France
| | - Mauricio Santillana
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Harvard Tseng-Hsi Chan School of Public Health, Boston, MA, United States
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, United States
| | - Audrey Lavenu
- Faculté de médecine, Université de Rennes 1, Rennes, France
- Institut de Recherche Mathématique de Rennes, Rennes, France
- Institut national de la santé et de la recherche médicale CIC 1414, Université de Rennes 1, Rennes, France
| |
Collapse
|
3
|
Stolerman LM, Clemente L, Poirier C, Parag KV, Majumder A, Masyn S, Resch B, Santillana M. Using digital traces to build prospective and real-time county-level early warning systems to anticipate COVID-19 outbreaks in the United States. SCIENCE ADVANCES 2023; 9:eabq0199. [PMID: 36652520 PMCID: PMC9848273 DOI: 10.1126/sciadv.abq0199] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 12/19/2022] [Indexed: 06/17/2023]
Abstract
Coronavirus disease 2019 (COVID-19) continues to affect the world, and the design of strategies to curb disease outbreaks requires close monitoring of their trajectories. We present machine learning methods that leverage internet-based digital traces to anticipate sharp increases in COVID-19 activity in U.S. counties. In a complementary direction to the efforts led by the Centers for Disease Control and Prevention (CDC), our models are designed to detect the time when an uptrend in COVID-19 activity will occur. Motivated by the need for finer spatial resolution epidemiological insights, we build upon previous efforts conceived at the state level. Our methods-tested in an out-of-sample manner, as events were unfolding, in 97 counties representative of multiple population sizes across the United States-frequently anticipated increases in COVID-19 activity 1 to 6 weeks before local outbreaks, defined when the effective reproduction number Rt becomes larger than 1 for a period of 2 weeks.
Collapse
Affiliation(s)
- Lucas M. Stolerman
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Department of Mathematics, Oklahoma State University, Stillwater, OK, USA
| | - Leonardo Clemente
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA
| | - Canelle Poirier
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Kris V. Parag
- NIHR Health Protection Research Unit, Behavioural Science and Evaluation, University of Bristol, Bristol, UK
| | | | - Serge Masyn
- Global Public Health, Janssen R&D, Beerse, Belgium
| | - Bernd Resch
- Department of Geoinformatics - Z-GIS, University of Salzburg, Salzburg, Austria
- Center for Geographic Analysis, Harvard University, Cambridge, MA, USA
| | - Mauricio Santillana
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA
- Harvard University, T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
4
|
Eum Y, Yoo EH. Using GPS-enabled mobile phones to evaluate the associations between human mobility changes and the onset of influenza illness. Spat Spatiotemporal Epidemiol 2022; 40:100458. [PMID: 35120680 PMCID: PMC8818086 DOI: 10.1016/j.sste.2021.100458] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Revised: 09/19/2021] [Accepted: 10/18/2021] [Indexed: 02/03/2023]
Abstract
Due to the challenges in data collection, there are few studies examining how individuals' routine mobility patterns change when they experience influenza-like symptoms (ILS). In the present study, we aimed to assess the association between changes in routine mobility and ILS using mobile phone-based GPS traces and self-reported surveys from 1,155 participants over the 2016-2017 influenza season. We used a set of mobility metrics to capture individuals' routine mobility patterns and matched their weekly ILS survey responses. For a statistical analysis, we used a time-stratified case-crossover analysis and conducted a stratified analysis to examine if such associations are moderated by demographic and socioeconomic factors, such as age, gender, occupational status, neighborhood poverty and education levels, and work type. We found that statistically significant associations existed between reduced routine mobility patterns and the experience of ILS. Results also indicated that the association between reduced mobility and ILS was significant only for female and for participants with high socioeconomic status. Our findings offered an improved understanding of ILS-associated mobility changes at the individual level and suggest the potential of individual mobility data for influenza surveillance.
Collapse
Affiliation(s)
- Youngseob Eum
- Department of Geography, State University of New York at Buffalo, Buffalo, NY, USA.
| | - Eun-Hye Yoo
- Department of Geography, State University of New York at Buffalo, Buffalo, NY, USA.
| |
Collapse
|
5
|
Epidemic tracking and forecasting: Lessons learned from a tumultuous year. Proc Natl Acad Sci U S A 2021; 118:2111456118. [PMID: 34903658 PMCID: PMC8713795 DOI: 10.1073/pnas.2111456118] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/23/2021] [Indexed: 01/15/2023] Open
|
6
|
Rothman RE, Hsieh YH, DuVal A, Talan DA, Moran GJ, Krishnadasan A, Shaw-Saliba K, Dugas AF. Front-Line Emergency Department Clinician Acceptability and Use of a Prototype Real-Time Cloud-Based Influenza Surveillance System. Front Public Health 2021; 9:740258. [PMID: 34805066 PMCID: PMC8601200 DOI: 10.3389/fpubh.2021.740258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
Objectives: To assess emergency department (ED) clinicians' perceptions of a novel real-time influenza surveillance system using a pre- and post-implementation structured survey. Methods: We created and implemented a laboratory-based real-time influenza surveillance system at two EDs at the beginning of the 2013-2014 influenza season. Patients with acute respiratory illness were tested for influenza using rapid PCR-based Cepheid Xpert Flu assay. Results were instantaneously uploaded to a cloud-based data aggregation system made available to clinicians via a web-based dashboard. Clinicians received bimonthly email updates summating year-to-date results. Clinicians were surveyed prior to, and after the influenza season, to assess their views regarding acceptability and utility of the surveillance system data which were shared via dashboard and email updates. Results: The pre-implementation survey revealed that the majority (82%) of the 151 ED clinicians responded that they “sporadically” or “don't,” actively seek influenza-related information during the season. However, most (75%) reported that they would find additional information regarding influenza prevalence useful. Following implementation, there was an overall increase in the frequency of clinician self-reporting increased access to surveillance information from 50 to 63%, with the majority (75%) indicating that the surveillance emails impacted their general awareness of influenza. Clinicians reported that the additional real-time surveillance data impacted their testing (65%) and treatment (51%) practices. Conclusions: The majority of ED clinicians found surveillance data useful and indicated the additional information impacted their clinical practice. Accurate and timely surveillance information, distributed in a provider-friendly format could impact ED clinician management of patients with suspected influenza.
Collapse
Affiliation(s)
- Richard E Rothman
- Department of Emergency Medicine, Johns Hopkins University, Baltimore, MD, United States
| | - Yu-Hsiang Hsieh
- Department of Emergency Medicine, Johns Hopkins University, Baltimore, MD, United States
| | - Anna DuVal
- Department of Emergency Medicine, Johns Hopkins University, Baltimore, MD, United States
| | - David A Talan
- Ronald Reagan University of California, Los Angeles (UCLA) Medical Center, Los Angeles, CA, United States
| | - Gregory J Moran
- University of California, Olive-View Medical Center, Los Angeles, CA, United States
| | - Anusha Krishnadasan
- University of California, Olive-View Medical Center, Los Angeles, CA, United States
| | - Katy Shaw-Saliba
- Department of Emergency Medicine, Johns Hopkins University, Baltimore, MD, United States
| | - Andrea F Dugas
- Department of Emergency Medicine, Johns Hopkins University, Baltimore, MD, United States
| |
Collapse
|
7
|
Gu Y, DeDoncker E, VanEnk R, Paul R, Peters S, Stoltman G, Prieto D. Accuracy of State-Level Surveillance during Emerging Outbreaks of Respiratory Viruses: A Model-Based Assessment. Med Decis Making 2021; 41:1004-1016. [PMID: 34269123 PMCID: PMC8488654 DOI: 10.1177/0272989x211022276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
It is long perceived that the more data collection, the more knowledge emerges about the real disease progression. During emergencies like the H1N1 and the severe acute respiratory syndrome coronavirus 2 pandemics, public health surveillance requested increased testing to address the exacerbated demand. However, it is currently unknown how accurately surveillance portrays disease progression through incidence and confirmed case trends. State surveillance, unlike commercial testing, can process specimens based on the upcoming demand (e.g., with testing restrictions). Hence, proper assessment of accuracy may lead to improvements for a robust infrastructure. Using the H1N1 pandemic experience, we developed a simulation that models the true unobserved influenza incidence trend in the State of Michigan, as well as trends observed at different data collection points of the surveillance system. We calculated the growth rate, or speed at which each trend increases during the pandemic growth phase, and we performed statistical experiments to assess the biases (or differences) between growth rates of unobserved and observed trends. We highlight the following results: 1) emergency-driven high-risk perception increases reporting, which leads to reduction of biases in the growth rates; 2) the best predicted growth rates are those estimated from the trend of specimens submitted to the surveillance point that receives reports from a variety of health care providers; and 3) under several criteria to queue specimens for viral subtyping with limited capacity, the best-performing criterion was to queue first-come, first-serve restricted to specimens with higher hospitalization risk. Under this criterion, the lab released capacity to subtype specimens for each day in the trend, which reduced the growth rate bias the most compared to other queuing criteria. Future research should investigate additional restrictions to the queue.
Collapse
Affiliation(s)
- Yuwen Gu
- Western Michigan University, Kalamazoo, MI, USA
| | | | | | - Rajib Paul
- University of North Carolina Charlotte College of Health and Human Services, Charlotte, NC, USA
| | - Susan Peters
- Michigan State University College of Human Medicine, East Lansing, MI, USA
| | - Gillian Stoltman
- Western Michigan University Homer Stryker MD School of Medicine, Kalamazoo, MI
| | - Diana Prieto
- Johns Hopkins University Carey Business School, Baltimore, MD, USA
| |
Collapse
|
8
|
Gabaldon-Figueira JC, Brew J, Doré DH, Umashankar N, Chaccour J, Orrillo V, Tsang LY, Blavia I, Fernández-Montero A, Bartolomé J, Grandjean Lapierre S, Chaccour C. Digital acoustic surveillance for early detection of respiratory disease outbreaks in Spain: a protocol for an observational study. BMJ Open 2021; 11:e051278. [PMID: 34215614 PMCID: PMC8257291 DOI: 10.1136/bmjopen-2021-051278] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
INTRODUCTION Cough is a common symptom of COVID-19 and other respiratory illnesses. However, objectively measuring its frequency and evolution is hindered by the lack of reliable and scalable monitoring systems. This can be overcome by newly developed artificial intelligence models that exploit the portability of smartphones. In the context of the ongoing COVID-19 pandemic, cough detection for respiratory disease syndromic surveillance represents a simple means for early outbreak detection and disease surveillance. In this protocol, we evaluate the ability of population-based digital cough surveillance to predict the incidence of respiratory diseases at population level in Navarra, Spain, while assessing individual determinants of uptake of these platforms. METHODS AND ANALYSIS Participants in the Cendea de Cizur, Zizur Mayor or attending the local University of Navarra (Pamplona) will be invited to monitor their night-time cough using the smartphone app Hyfe Cough Tracker. Detected coughs will be aggregated in time and space. Incidence of COVID-19 and other diagnosed respiratory diseases within the participants cohort, and the study area and population will be collected from local health facilities and used to carry out an autoregressive moving average analysis on those independent time series. In a mixed-methods design, we will explore barriers and facilitators of continuous digital cough monitoring by evaluating participation patterns and sociodemographic characteristics. Participants will fill an acceptability questionnaire and a subgroup will participate in focus group discussions. ETHICS AND DISSEMINATION Ethics approval was obtained from the ethics committee of the Centre Hospitalier de l'Université de Montréal, Canada and the Medical Research Ethics Committee of Navarre, Spain. Preliminary findings will be shared with civil and health authorities and reported to individual participants. Results will be submitted for publication in peer-reviewed scientific journals and international conferences. TRIAL REGISTRATION NUMBER NCT04762693.
Collapse
Affiliation(s)
| | - Joe Brew
- Research and Development Department, Hyfe, Wilmington, Delaware, USA
| | - Dominique Hélène Doré
- Immunopathology Axis, Research Center of the University of Montreal Hospital Center, Montréal, Québec, Canada
| | - Nita Umashankar
- Fowler College of Business, San Diego State University, San Diego, California, USA
| | - Juliane Chaccour
- Infectious Diseases Area, University of Navarra Clinic, Pamplona, Spain
| | - Virginia Orrillo
- School of Pharmacy and Nutrition, University of Navarra, Pamplona, Spain
| | - Lai Yu Tsang
- Global Health Institute, Stony Brook University, Stony Brook, New York, USA
| | - Isabel Blavia
- School of Pharmacy and Nutrition, University of Navarra, Pamplona, Spain
| | | | - Javier Bartolomé
- Primary Healthcare, Navarre Health Service-Osasunbidea, Zizur Mayor, Spain
| | - Simon Grandjean Lapierre
- Immunopathology Axis, Research Center of the University of Montreal Hospital Center, Montréal, Québec, Canada
- Department of Microbiology, Infectious Diseases and Immunology, Research Center of the University of Montreal Hospital Center, Montreal, Québec, Canada
| | - C Chaccour
- Infectious Diseases Area, University of Navarra Clinic, Pamplona, Spain
- ISGlobal, Hospital Clinic, University of Barcelona, Barcelona, Spain
- Ifakara Institute of Health, Ifakara Institute of Health, Ifakara, Tanzania
| |
Collapse
|
9
|
A novel data-driven methodology for influenza outbreak detection and prediction. Sci Rep 2021; 11:13275. [PMID: 34168200 PMCID: PMC8225876 DOI: 10.1038/s41598-021-92484-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Accepted: 06/08/2021] [Indexed: 12/01/2022] Open
Abstract
Influenza is an infectious disease that leads to an estimated 5 million cases of severe illness and 650,000 respiratory deaths worldwide each year. The early detection and prediction of influenza outbreaks are crucial for efficient resource planning to save patient’s lives and healthcare costs. We propose a new data-driven methodology for influenza outbreak detection and prediction at very local levels. A doctor’s diagnostic dataset of influenza-like illness from more than 3000 clinics in Malaysia is used in this study because these diagnostic data are reliable and can be captured promptly. A new region index (RI) of the influenza outbreak is proposed based on the diagnostic dataset. By analysing the anomalies in the weekly RI value, potential outbreaks are identified using statistical methods. An ensemble learning method is developed to predict potential influenza outbreaks. Cross-validation is conducted to optimize the hyperparameters of the ensemble model. A testing data set is used to provide an unbiased evaluation of the model. The proposed methodology is shown to be sensitive and accurate at influenza outbreak prediction, with average of 75% recall, 74% precision, and 83% accuracy scores across five regions in Malaysia. The results are also validated by Google Flu Trends data, news reports, and surveillance data released by World Health Organization.
Collapse
|
10
|
Lu FS, Nguyen AT, Link NB, Molina M, Davis JT, Chinazzi M, Xiong X, Vespignani A, Lipsitch M, Santillana M. Estimating the cumulative incidence of COVID-19 in the United States using influenza surveillance, virologic testing, and mortality data: Four complementary approaches. PLoS Comput Biol 2021; 17:e1008994. [PMID: 34138845 PMCID: PMC8241061 DOI: 10.1371/journal.pcbi.1008994] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 06/29/2021] [Accepted: 04/22/2021] [Indexed: 12/20/2022] Open
Abstract
Effectively designing and evaluating public health responses to the ongoing COVID-19 pandemic requires accurate estimation of the prevalence of COVID-19 across the United States (US). Equipment shortages and varying testing capabilities have however hindered the usefulness of the official reported positive COVID-19 case counts. We introduce four complementary approaches to estimate the cumulative incidence of symptomatic COVID-19 in each state in the US as well as Puerto Rico and the District of Columbia, using a combination of excess influenza-like illness reports, COVID-19 test statistics, COVID-19 mortality reports, and a spatially structured epidemic model. Instead of relying on the estimate from a single data source or method that may be biased, we provide multiple estimates, each relying on different assumptions and data sources. Across our four approaches emerges the consistent conclusion that on April 4, 2020, the estimated case count was 5 to 50 times higher than the official positive test counts across the different states. Nationally, our estimates of COVID-19 symptomatic cases as of April 4 have a likely range of 2.3 to 4.8 million, with possibly as many as 7.6 million cases, up to 25 times greater than the cumulative confirmed cases of about 311,000. Extending our methods to May 16, 2020, we estimate that cumulative symptomatic incidence ranges from 4.9 to 10.1 million, as opposed to 1.5 million positive test counts. The proposed combination of approaches may prove useful in assessing the burden of COVID-19 during resurgences in the US and other countries with comparable surveillance systems.
Collapse
Affiliation(s)
- Fred S. Lu
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Andre T. Nguyen
- University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
- Booz Allen Hamilton, Columbia, Maryland, United States of America
| | - Nicholas B. Link
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
| | - Mathieu Molina
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
| | - Jessica T. Davis
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Matteo Chinazzi
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Xinyue Xiong
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Alessandro Vespignani
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Marc Lipsitch
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
11
|
Aiken EL, Nguyen AT, Viboud C, Santillana M. Toward the use of neural networks for influenza prediction at multiple spatial resolutions. SCIENCE ADVANCES 2021; 7:7/25/eabb1237. [PMID: 34134985 PMCID: PMC8208709 DOI: 10.1126/sciadv.abb1237] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 04/29/2021] [Indexed: 05/24/2023]
Abstract
Mitigating the effects of disease outbreaks with timely and effective interventions requires accurate real-time surveillance and forecasting of disease activity, but traditional health care-based surveillance systems are limited by inherent reporting delays. Machine learning methods have the potential to fill this temporal "data gap," but work to date in this area has focused on relatively simple methods and coarse geographic resolutions (state level and above). We evaluate the predictive performance of a gated recurrent unit neural network approach in comparison with baseline machine learning methods for estimating influenza activity in the United States at the state and city levels and experiment with the inclusion of real-time Internet search data. We find that the neural network approach improves upon baseline models for long time horizons of prediction but is not improved by real-time internet search data. We conduct a thorough analysis of feature importances in all considered models for interpretability purposes.
Collapse
Affiliation(s)
- Emily L Aiken
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA.
| | - Andre T Nguyen
- Booz Allen Hamilton, Columbia, MD 21044, USA
- University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | - Cecile Viboud
- Fogarty International Center, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mauricio Santillana
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA.
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02215, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA 02215, USA
| |
Collapse
|
12
|
Choi H, Choi WS, Han E. Suggestion of a simpler and faster influenza-like illness surveillance system using 2014-2018 claims data in Korea. Sci Rep 2021; 11:11243. [PMID: 34045533 PMCID: PMC8159991 DOI: 10.1038/s41598-021-90511-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 05/06/2021] [Indexed: 11/10/2022] Open
Abstract
Influenza is an important public health concern. We propose a new real-time influenza-like illness (ILI) surveillance system that utilizes a nationwide prospective drug utilization monitoring in Korea. We defined ILI-related claims as outpatient claims that contain both antipyretic and antitussive agents and calculated the weekly rate of ILI-related claims, which was compared to weekly ILI rates from clinical sentinel surveillance data during 2014-2018. We performed a cross-correlation analysis using Pearson's correlation, time-series analysis to explore actual correlations after removing any dubious correlations due to underlying non-stationarity in both data sets. We used the moving epidemic method (MEM) to estimate an absolute threshold to designate potential influenza epidemics for the weeks with incidence rates above the threshold. We observed a strong correlation between the two surveillance systems each season. The absolute thresholds for the 4-years were 84.64 and 86.19 cases per 1000claims for claims data and 12.27 and 16.82 per 1000 patients for sentinel data. The epidemic patterns were more similar in the 2016-2017 and 2017-2018 seasons than the 2014-2015 and 2015-2016 seasons. ILI claims data can be loaded to a drug utilization review system in Korea to make an influenza surveillance system.
Collapse
Affiliation(s)
- HeeKyoung Choi
- College of Pharmacy, Yonsei Institute of Pharmaceutical Research, Yonsei University, 162-1 Songdo-dong, Yeonsu-gu, Incheon, Seoul, Republic of Korea
- Division of Infectious Diseases, Department of Internal Medicine, National Health Insurance Service Ilsan Hospital, Ilsan, Republic of Korea
| | - Won Suk Choi
- Division of Infectious Diseases, Department of Internal Medicine, Ansan Hospital, Korea University College of Medicine, Ansan, Republic of Korea
| | - Euna Han
- College of Pharmacy, Yonsei Institute of Pharmaceutical Research, Yonsei University, 162-1 Songdo-dong, Yeonsu-gu, Incheon, Seoul, Republic of Korea.
| |
Collapse
|
13
|
Poirier C, Hswen Y, Bouzillé G, Cuggia M, Lavenu A, Brownstein JS, Brewer T, Santillana M. Influenza forecasting for French regions combining EHR, web and climatic data sources with a machine learning ensemble approach. PLoS One 2021; 16:e0250890. [PMID: 34010293 PMCID: PMC8133501 DOI: 10.1371/journal.pone.0250890] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 04/16/2021] [Indexed: 11/25/2022] Open
Abstract
Effective and timely disease surveillance systems have the potential to help public health officials design interventions to mitigate the effects of disease outbreaks. Currently, healthcare-based disease monitoring systems in France offer influenza activity information that lags real-time by one to three weeks. This temporal data gap introduces uncertainty that prevents public health officials from having a timely perspective on the population-level disease activity. Here, we present a machine-learning modeling approach that produces real-time estimates and short-term forecasts of influenza activity for the twelve continental regions of France by leveraging multiple disparate data sources that include, Google search activity, real-time and local weather information, flu-related Twitter micro-blogs, electronic health records data, and historical disease activity synchronicities across regions. Our results show that all data sources contribute to improving influenza surveillance and that machine-learning ensembles that combine all data sources lead to accurate and timely predictions.
Collapse
Affiliation(s)
- Canelle Poirier
- INSERM, U1099, Rennes, France
- Université de Rennes 1, LTSI, Rennes, France
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States of America
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, United States of America
- * E-mail: (CP); (MS)
| | - Yulin Hswen
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA, United States of America
- Innovation Program, Boston Children’s Hospital, Boston, MA, United States of America
| | - Guillaume Bouzillé
- INSERM, U1099, Rennes, France
- Université de Rennes 1, LTSI, Rennes, France
- CHU Rennes, Centre de Données Cliniques, Rennes, France
| | - Marc Cuggia
- INSERM, U1099, Rennes, France
- Université de Rennes 1, LTSI, Rennes, France
- CHU Rennes, Centre de Données Cliniques, Rennes, France
| | - Audrey Lavenu
- Université de Rennes 1, Faculté de médecine, Rennes, France
- INSERM CIC 1414, Université de Rennes 1, Rennes, France
- IRMAR, Institut de Recherche Mathématique de Rennes, Rennes, France
| | - John S. Brownstein
- Innovation Program, Boston Children’s Hospital, Boston, MA, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States of America
| | - Thomas Brewer
- Innovation Program, Boston Children’s Hospital, Boston, MA, United States of America
| | - Mauricio Santillana
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States of America
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, United States of America
- * E-mail: (CP); (MS)
| |
Collapse
|
14
|
Improving influenza surveillance based on multi-granularity deep spatiotemporal neural network. Comput Biol Med 2021; 134:104482. [PMID: 34051452 DOI: 10.1016/j.compbiomed.2021.104482] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 04/16/2021] [Accepted: 05/06/2021] [Indexed: 11/23/2022]
Abstract
Influenza is a common respiratory disease that can cause human illness and death. Timely and accurate prediction of disease risk is of great importance for public health management and prevention. The influenza data belong to typical spatiotemporal data in that influenza transmission is influenced by regional and temporal interactions. Many existing methods only use the historical time series information for prediction, which ignores the effect of spatial correlations of neighboring regions and temporal correlations of different time periods. Mining spatiotemporal information for risk prediction is a significant and challenging issue. In this paper, we propose a new end-to-end spatiotemporal deep neural network structure for influenza risk prediction. The proposed model mainly consists of two parts. The first stage is the spatiotemporal feature extraction stage where two-stream convolutional and recurrent neural networks are constructed to extract the different regions and time granularity information. Then, a dynamically parametric-based fusion method is adopted to integrate the two-stream features and making predictions. In our work, we demonstrate that our method, tested on two influenza-like illness (ILI) datasets (US-HHS and SZ-HIC), achieved the best performance across all evaluation metrics. The results imply that our method has outstanding performance for spatiotemporal feature extraction and enables accurate predictions compared to other well-known influenza forecasting models.
Collapse
|
15
|
Kogan NE, Clemente L, Liautaud P, Kaashoek J, Link NB, Nguyen AT, Lu FS, Huybers P, Resch B, Havas C, Petutschnig A, Davis J, Chinazzi M, Mustafa B, Hanage WP, Vespignani A, Santillana M. An early warning approach to monitor COVID-19 activity with multiple digital traces in near real time. SCIENCE ADVANCES 2021; 7:eabd6989. [PMID: 33674304 PMCID: PMC7935356 DOI: 10.1126/sciadv.abd6989] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 01/19/2021] [Indexed: 05/18/2023]
Abstract
Given still-high levels of coronavirus disease 2019 (COVID-19) susceptibility and inconsistent transmission-containing strategies, outbreaks have continued to emerge across the United States. Until effective vaccines are widely deployed, curbing COVID-19 will require carefully timed nonpharmaceutical interventions (NPIs). A COVID-19 early warning system is vital for this. Here, we evaluate digital data streams as early indicators of state-level COVID-19 activity from 1 March to 30 September 2020. We observe that increases in digital data stream activity anticipate increases in confirmed cases and deaths by 2 to 3 weeks. Confirmed cases and deaths also decrease 2 to 4 weeks after NPI implementation, as measured by anonymized, phone-derived human mobility data. We propose a means of harmonizing these data streams to identify future COVID-19 outbreaks. Our results suggest that combining disparate health and behavioral data may help identify disease activity changes weeks before observation using traditional epidemiological monitoring.
Collapse
Affiliation(s)
- Nicole E Kogan
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Leonardo Clemente
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
| | - Parker Liautaud
- Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA, USA.
| | - Justin Kaashoek
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Nicholas B Link
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Andre T Nguyen
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- University of Maryland, Baltimore County, Baltimore, MD, USA
- Booz Allen Hamilton, Columbia, MD, USA
| | - Fred S Lu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Peter Huybers
- Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA, USA
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Bernd Resch
- Department of Geoinformatics - Z_GIS, University of Salzburg, Salzburg, Austria
- Center for Geographic Analysis, Harvard University, Cambridge, MA, USA
| | - Clemens Havas
- Department of Geoinformatics - Z_GIS, University of Salzburg, Salzburg, Austria
| | - Andreas Petutschnig
- Department of Geoinformatics - Z_GIS, University of Salzburg, Salzburg, Austria
| | | | | | - Backtosch Mustafa
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - William P Hanage
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
16
|
Duchemin T, Bastard J, Ante-Testard PA, Assab R, Daouda OS, Duval A, Garsi JP, Lounissi R, Nekkab N, Neynaud H, Smith DRM, Dab W, Jean K, Temime L, Hocine MN. Monitoring sick leave data for early detection of influenza outbreaks. BMC Infect Dis 2021; 21:52. [PMID: 33430793 PMCID: PMC7799403 DOI: 10.1186/s12879-020-05754-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 12/28/2020] [Indexed: 12/03/2022] Open
Abstract
Background Workplace absenteeism increases significantly during influenza epidemics. Sick leave records may facilitate more timely detection of influenza outbreaks, as trends in increased sick leave may precede alerts issued by sentinel surveillance systems by days or weeks. Sick leave data have not been comprehensively evaluated in comparison to traditional surveillance methods. The aim of this paper is to study the performance and the feasibility of using a detection system based on sick leave data to detect influenza outbreaks. Methods Sick leave records were extracted from private French health insurance data, covering on average 209,932 companies per year across a wide range of sizes and sectors. We used linear regression to estimate the weekly number of new sick leave spells between 2016 and 2017 in 12 French regions, adjusting for trend, seasonality and worker leaves on historical data from 2010 to 2015. Outbreaks were detected using a 95%-prediction interval. This method was compared to results from the French Sentinelles network, a gold-standard primary care surveillance system currently in place. Results Using sick leave data, we detected 92% of reported influenza outbreaks between 2016 and 2017, on average 5.88 weeks prior to outbreak peaks. Compared to the existing Sentinelles model, our method had high sensitivity (89%) and positive predictive value (86%), and detected outbreaks on average 2.5 weeks earlier. Conclusion Sick leave surveillance could be a sensitive, specific and timely tool for detection of influenza outbreaks. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-020-05754-5.
Collapse
Affiliation(s)
- Tom Duchemin
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France. .,Malakoff Humanis, 21 Rue Laffitte, 75009, Paris, France.
| | - Jonathan Bastard
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France.,Institut Pasteur, Epidemiology and Modelling of Antibiotic Evasion (EMAE), Paris, France.,PACRI unit, Conservatoire National des Arts et Métiers, Institut Pasteur, Paris, France.,Université Paris-Saclay, UVSQ, Inserm, CESP, Anti-infective evasion and pharmacoepidemiology team, Montigny-Le-Bretonneux, France
| | - Pearl Anne Ante-Testard
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France.,PACRI unit, Conservatoire National des Arts et Métiers, Institut Pasteur, Paris, France
| | - Rania Assab
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France
| | - Oumou Salama Daouda
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France
| | - Audrey Duval
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France.,Institut Pasteur, Epidemiology and Modelling of Antibiotic Evasion (EMAE), Paris, France.,Université Paris-Saclay, UVSQ, Inserm, CESP, Anti-infective evasion and pharmacoepidemiology team, Montigny-Le-Bretonneux, France.,Biodiversity and Epidemiology of Bacterial Pathogens, Institut Pasteur, Paris, France
| | - Jérôme-Philippe Garsi
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France
| | | | - Narimane Nekkab
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France.,Malaria: Parasites and Hosts, Department of Parasites and Insect Vectors, Institut Pasteur, Paris, France
| | - Helene Neynaud
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France
| | - David R M Smith
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France.,Institut Pasteur, Epidemiology and Modelling of Antibiotic Evasion (EMAE), Paris, France.,Université Paris-Saclay, UVSQ, Inserm, CESP, Anti-infective evasion and pharmacoepidemiology team, Montigny-Le-Bretonneux, France
| | - William Dab
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France
| | - Kevin Jean
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France.,PACRI unit, Conservatoire National des Arts et Métiers, Institut Pasteur, Paris, France
| | - Laura Temime
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France.,PACRI unit, Conservatoire National des Arts et Métiers, Institut Pasteur, Paris, France
| | - Mounia N Hocine
- MESuRS laboratory, Conservatoire National des Arts et Métiers, 292 Rue Saint-Martin, 75003, Paris, France
| |
Collapse
|
17
|
Lu FS, Nguyen AT, Link NB, Davis JT, Chinazzi M, Xiong X, Vespignani A, Lipsitch M, Santillana M. Estimating the Cumulative Incidence of COVID-19 in the United States Using Four Complementary Approaches. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2020:2020.04.18.20070821. [PMID: 32587997 PMCID: PMC7310656 DOI: 10.1101/2020.04.18.20070821] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Effectively designing and evaluating public health responses to the ongoing COVID-19 pandemic requires accurate estimation of the prevalence of COVID-19 across the United States (US). Equipment shortages and varying testing capabilities have however hindered the useful-ness of the official reported positive COVID-19 case counts. We introduce four complementary approaches to estimate the cumulative incidence of symptomatic COVID-19 in each state in the US as well as Puerto Rico and the District of Columbia, using a combination of excess influenza-like illness reports, COVID-19 test statistics, COVID-19 mortality reports, and a spatially structured epidemic model. Instead of relying on the estimate from a single data source or method that may be biased, we provide multiple estimates, each relying on different assumptions and data sources. Across our four approaches emerges the consistent conclusion that on April 4, 2020, the estimated case count was 5 to 50 times higher than the official positive test counts across the different states. Nationally, our estimates of COVID-19 symptomatic cases as of April 4 have a likely range of 2.2 to 4.9 million, with possibly as many as 8.1 million cases, up to 26 times greater than the cumulative confirmed cases of about 311,000. Extending our method to May 16, 2020, we estimate that cumulative symptomatic incidence ranges from 6.0 to 10.3 million, as opposed to 1.5 million positive test counts. The proposed combination of approaches may prove useful in assessing the burden of COVID-19 during resurgences in the US and other countries with comparable surveillance systems.
Collapse
Affiliation(s)
- Fred S. Lu
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
- Department of Statistics, Stanford University, Stanford, CA
| | - Andre T. Nguyen
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
- University of Maryland, Baltimore County, Baltimore, MD
- Booz Allen Hamilton, Columbia, MD
| | - Nicholas B. Link
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
| | - Jessica T. Davis
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA USA
| | - Matteo Chinazzi
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA USA
| | - Xinyue Xiong
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA USA
| | - Alessandro Vespignani
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA USA
| | - Marc Lipsitch
- Department of Epidemiology, Harvard T.H. Chan School of Public Health
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health
- Department of Pediatrics, Harvard Medical School, Boston, MA
| |
Collapse
|
18
|
Aiken EL, McGough SF, Majumder MS, Wachtel G, Nguyen AT, Viboud C, Santillana M. Real-time estimation of disease activity in emerging outbreaks using internet search information. PLoS Comput Biol 2020; 16:e1008117. [PMID: 32804932 PMCID: PMC7451983 DOI: 10.1371/journal.pcbi.1008117] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 08/27/2020] [Accepted: 07/01/2020] [Indexed: 11/18/2022] Open
Abstract
Understanding the behavior of emerging disease outbreaks in, or ahead of, real-time could help healthcare officials better design interventions to mitigate impacts on affected populations. Most healthcare-based disease surveillance systems, however, have significant inherent reporting delays due to data collection, aggregation, and distribution processes. Recent work has shown that machine learning methods leveraging a combination of traditionally collected epidemiological information and novel Internet-based data sources, such as disease-related Internet search activity, can produce meaningful "nowcasts" of disease incidence ahead of healthcare-based estimates, with most successful case studies focusing on endemic and seasonal diseases such as influenza and dengue. Here, we apply similar computational methods to emerging outbreaks in geographic regions where no historical presence of the disease of interest has been observed. By combining limited available historical epidemiological data available with disease-related Internet search activity, we retrospectively estimate disease activity in five recent outbreaks weeks ahead of traditional surveillance methods. We find that the proposed computational methods frequently provide useful real-time incidence estimates that can help fill temporal data gaps resulting from surveillance reporting delays. However, the proposed methods are limited by issues of sample bias and skew in search query volumes, perhaps as a result of media coverage.
Collapse
Affiliation(s)
- Emily L. Aiken
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sarah F. McGough
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Maimuna S. Majumder
- Department of Healthcare Policy, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Gal Wachtel
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
| | - Andre T. Nguyen
- Booz Allen Hamilton, Columbia, Maryland, United States of America
- University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
| | - Cecile Viboud
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Mauricio Santillana
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
19
|
Beesley LJ, Salvatore M, Fritsche LG, Pandit A, Rao A, Brummett C, Willer CJ, Lisabeth LD, Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat Med 2020; 39:773-800. [PMID: 31859414 PMCID: PMC7983809 DOI: 10.1002/sim.8445] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 09/10/2019] [Accepted: 11/16/2019] [Indexed: 01/03/2023]
Abstract
Biobanks linked to electronic health records provide rich resources for health-related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis-generating studies of disease-treatment, disease-exposure, and disease-gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank-based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank-based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.
Collapse
Affiliation(s)
| | | | | | - Anita Pandit
- University of Michigan, Department of Biostatistics
| | - Arvind Rao
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | - Chad Brummett
- University of Michigan, Department of Anesthesiology
| | - Cristen J. Willer
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | | | | |
Collapse
|
20
|
Gu D, Yang X, Deng S, Liang C, Wang X, Wu J, Guo J. Tracking Knowledge Evolution in Cloud Health Care Research: Knowledge Map and Common Word Analysis. J Med Internet Res 2020; 22:e15142. [PMID: 32130115 PMCID: PMC7064966 DOI: 10.2196/15142] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 10/28/2019] [Accepted: 12/15/2019] [Indexed: 11/26/2022] Open
Abstract
Background With the continuous development of the internet and the explosive growth in data, big data technology has emerged. With its ongoing development and application, cloud computing technology provides better data storage and analysis. The development of cloud health care provides a more convenient and effective solution for health. Studying the evolution of knowledge and research hotspots in the field of cloud health care is increasingly important for medical informatics. Scholars in the medical informatics community need to understand the extent of the evolution of and possible trends in cloud health care research to inform their future research. Objective Drawing on the cloud health care literature, this study aimed to describe the development and evolution of research themes in cloud health care through a knowledge map and common word analysis. Methods A total of 2878 articles about cloud health care was retrieved from the Web of Science database. We used cybermetrics to analyze and visualize the keywords in these articles. We created a knowledge map to show the evolution of cloud health care research. We used co-word analysis to identify the hotspots and their evolution in cloud health care research. Results The evolution and development of cloud health care services are described. In 2007-2009 (Phase I), most scholars used cloud computing in the medical field mainly to reduce costs, and grid computing and cloud computing were the primary technologies. In 2010-2012 (Phase II), the security of cloud systems became of interest to scholars. In 2013-2015 (Phase III), medical informatization enabled big data for health services. In 2016-2017 (Phase IV), machine learning and mobile technologies were introduced to the medical field. Conclusions Cloud health care research has been rapidly developing worldwide, and technologies used in cloud health research are simultaneously diverging and becoming smarter. Cloud–based mobile health, cloud–based smart health, and the security of cloud health data and systems are three possible trends in the future development of the cloud health care field.
Collapse
Affiliation(s)
- Dongxiao Gu
- The School of Management, Hefei University of Technology, Hefei, China
| | - Xuejie Yang
- The School of Management, Hefei University of Technology, Hefei, China
| | - Shuyuan Deng
- The Seidman College of Business, Grand Valley State University, Grand Rapids, MI, United States
| | - Changyong Liang
- The School of Management, Hefei University of Technology, Hefei, China
| | - Xiaoyu Wang
- The 1st Affiliated Hospital, Anhui University of Traditional Chinese Medicine, Hefei, China
| | - Jiao Wu
- College of Business Administration, Central Michigan University, Mount Pleasant, MI, United States
| | - Jingjing Guo
- The School of Management, Hefei University of Technology, Hefei, China
| |
Collapse
|
21
|
Samaras L, García-Barriocanal E, Sicilia MA. Syndromic surveillance using web data: a systematic review. INNOVATION IN HEALTH INFORMATICS 2020. [PMCID: PMC7153324 DOI: 10.1016/b978-0-12-819043-2.00002-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
During the recent years, a lot of debate is taken place about the evolution of Smart Healthcare systems. Particularly, how these systems can help people improve human conditions of health, by taking advantages of the new Information and Communication Technologies (ICT), regarding early prediction and efficient treatment. The purpose of this study is to provide a systematic review of the current literature available that focuses on information systems on syndromic surveillance using web data. All published items concern articles, books, reviews, reports, conference announcements, and dissertations. We used a variation of PRISMA Statements methodology to conduct a systematic review. The review identifies the relevant published papers from the year 2004 to 2018, systematically includes and explores them to extract similarities, gaps, and conclusions on the research that has been done so far. The results presented concern the year, the examined disease, the web data source, the geographic location/country, and the data analysis method used. The results show that influenza is the most examined infectious disease. The internet tools most used are Twitter and Google. Regarding the geographical areas explored in the published papers, the most examined country is the United States, since many scientists come from this country. There is a significant growth of articles since 2009. There are also various statistical methods used to correlate the data retrieved from the internet to the data from national authorities. The conclusion of all researches is that the Web can be a useful tool for the detection of serious epidemics and for a creation of a syndromic surveillance system using the Web, since we can predict epidemics from web data before they are officially detected in population. With the advance of ICT, Smart Healthcare can benefit from the monitoring of epidemics and the early prediction of such a system, improving national or international health strategies and policy decision. This can be achieved through the provision of new technology tools to enhance health monitoring systems toward the new innovations of Smart Health or eHealth, even with the emerging technologies of Internet of Things. The challenges and impacts of an electronic system based on internet data include the social, medical, and technological disciplines. These can be further extended to Smart Healthcare, as the data streaming can provide with real-time information, awareness on epidemics and alerts for both patients or medical scientists. Finally, these new systems can help improve the standards of human life.
Collapse
|
22
|
M Bublitz F, Oetomo A, S Sahu K, Kuang A, X Fadrique L, E Velmovitsky P, M Nobrega R, P Morita P. Disruptive Technologies for Environment and Health Research: An Overview of Artificial Intelligence, Blockchain, and Internet of Things. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 16:E3847. [PMID: 31614632 PMCID: PMC6843531 DOI: 10.3390/ijerph16203847] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 10/05/2019] [Accepted: 10/07/2019] [Indexed: 12/13/2022]
Abstract
The purpose of this descriptive research paper is to initiate discussions on the use of innovative technologies and their potential to support the research and development of pan-Canadian monitoring and surveillance activities associated with environmental impacts on health and within the health system. Its primary aim is to provide a review of disruptive technologies and their current uses in the environment and in healthcare. Drawing on extensive experience in population-level surveillance through the use of technology, knowledge from prior projects in the field, and conducting a review of the technologies, this paper is meant to serve as the initial steps toward a better understanding of the research area. In doing so, we hope to be able to better assess which technologies might best be leveraged to advance this unique intersection of health and environment. This paper first outlines the current use of technologies at the intersection of public health and the environment, in particular, Artificial Intelligence (AI), Blockchain, and the Internet of Things (IoT). The paper provides a description for each of these technologies, along with a summary of their current applications, and a description of the challenges one might face with adopting them. Thereafter, a high-level reference architecture, that addresses the challenges of the described technologies and could potentially be incorporated into the pan-Canadian surveillance system, is conceived and presented.
Collapse
Affiliation(s)
- Frederico M Bublitz
- School of Public Health and Health Systems, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
- Center for Strategic Technologies in Health (NUTES), State University of Paraiba (UEPB), Campina Grande, PB 58429-500, Brazil.
| | - Arlene Oetomo
- School of Public Health and Health Systems, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
| | - Kirti S Sahu
- School of Public Health and Health Systems, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
| | - Amethyst Kuang
- School of Public Health and Health Systems, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
| | - Laura X Fadrique
- School of Public Health and Health Systems, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
| | - Pedro E Velmovitsky
- School of Public Health and Health Systems, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
| | - Raphael M Nobrega
- School of Public Health and Health Systems, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
| | - Plinio P Morita
- School of Public Health and Health Systems, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
- Institute of Health Policy, Management, and Evaluation, University of Toronto, Toronto, ON M5T 3M6, Canada.
- Research Institute for Aging, University of Waterloo, Waterloo, ON N2J 0E2, Canada.
- Department of Systems Design Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
- eHealth Innovation, Techna Institute, University Health Network, Toronto, ON M5G 2C4, Canada.
| |
Collapse
|
23
|
Baltrusaitis K, Vespignani A, Rosenfeld R, Gray J, Raymond D, Santillana M. Differences in Regional Patterns of Influenza Activity Across Surveillance Systems in the United States: Comparative Evaluation. JMIR Public Health Surveill 2019; 5:e13403. [PMID: 31579019 PMCID: PMC6777281 DOI: 10.2196/13403] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 07/02/2019] [Accepted: 07/19/2019] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND The Centers for Disease Control and Prevention (CDC) tracks influenza-like illness (ILI) using information on patient visits to health care providers through the Outpatient Influenza-like Illness Surveillance Network (ILINet). As participation in this system is voluntary, the composition, coverage, and consistency of health care reports vary from state to state, leading to different measures of ILI activity between regions. The degree to which these measures reflect actual differences in influenza activity or systematic differences in the methods used to collect and aggregate the data is unclear. OBJECTIVE The objective of our study was to qualitatively and quantitatively compare national and region-specific ILI activity in the United States across 4 surveillance data sources-CDC ILINet, Flu Near You (FNY), athenahealth, and HealthTweets.org-to determine whether these data sources, commonly used as input in influenza modeling efforts, show geographical patterns that are similar to those observed in CDC ILINet's data. We also compared the yearly percentage of FNY participants who sought health care for ILI symptoms across geographical areas. METHODS We compared the national and regional 2018-2019 ILI activity baselines, calculated using noninfluenza weeks from previous years, for each surveillance data source. We also compared measures of ILI activity across geographical areas during 3 influenza seasons, 2015-2016, 2016-2017, and 2017-2018. Geographical differences in weekly ILI activity within each data source were also assessed using relative mean differences and time series heatmaps. National and regional age-adjusted health care-seeking percentages were calculated for each influenza season by dividing the number of FNY participants who sought medical care for ILI symptoms by the total number of ILI reports within an influenza season. Pearson correlations were used to assess the association between the health care-seeking percentages and baselines for each surveillance data source. RESULTS We observed consistent differences in ILI activity across geographical areas for CDC ILINet and athenahealth data. ILI activity for FNY displayed little variation across geographical areas, whereas differences in ILI activity for HealthTweets.org were associated with the total number of tweets within a geographical area. The percentage of FNY participants who sought health care for ILI symptoms differed slightly across geographical areas, and these percentages were positively correlated with CDC ILINet and athenahealth baselines. CONCLUSIONS Our findings suggest that differences in ILI activity across geographical areas as reported by a given surveillance system may not accurately reflect true differences in the prevalence of ILI. Instead, these differences may reflect systematic collection and aggregation biases that are particular to each system and consistent across influenza seasons. These findings are potentially relevant in the real-time analysis of the influenza season and in the definition of unbiased forecast models.
Collapse
Affiliation(s)
- Kristin Baltrusaitis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States
| | | | - Roni Rosenfeld
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Josh Gray
- athenaResearch at athenahealth, Watertown, MA, United States
| | - Dorrie Raymond
- athenaResearch at athenahealth, Watertown, MA, United States
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
24
|
Su K, Xu L, Li G, Ruan X, Li X, Deng P, Li X, Li Q, Chen X, Xiong Y, Lu S, Qi L, Shen C, Tang W, Rong R, Hong B, Ning Y, Long D, Xu J, Shi X, Yang Z, Zhang Q, Zhuang Z, Zhang L, Xiao J, Li Y. Forecasting influenza activity using self-adaptive AI model and multi-source data in Chongqing, China. EBioMedicine 2019; 47:284-292. [PMID: 31477561 PMCID: PMC6796527 DOI: 10.1016/j.ebiom.2019.08.024] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 08/09/2019] [Accepted: 08/09/2019] [Indexed: 02/05/2023] Open
Abstract
Background Early detection of influenza activity followed by timely response is a critical component of preparedness for seasonal influenza epidemic and influenza pandemic. However, most relevant studies were conducted at the regional or national level with regular seasonal influenza trends. There are few feasible strategies to forecast influenza activity at the local level with irregular trends. Methods Multi-source electronic data, including historical percentage of influenza-like illness (ILI%), weather data, Baidu search index and Sina Weibo data of Chongqing, China, were collected and integrated into an innovative Self-adaptive AI Model (SAAIM), which was constructed by integrating Seasonal Autoregressive Integrated Moving Average model and XGBoost model using a self-adaptive weight adjustment mechanism. SAAIM was applied to ILI% forecast in Chongqing from 2017 to 2018, of which the performance was compared with three previously available models on forecasting. Findings ILI% showed an irregular seasonal trend from 2012 to 2018 in Chongqing. Compared with three reference models, SAAIM achieved the best performance on forecasting ILI% of Chongqing with the mean absolute percentage error (MAPE) of 11·9%, 7·5%, and 11·9% during the periods of the year 2014–2016, 2017, and 2018 respectively. Among the three categories of source data, historical influenza activity contributed the most to the forecast accuracy by decreasing the MAPE by 19·6%, 43·1%, and 11·1%, followed by weather information (MAPE reduced by 3·3%, 17·1%, and 2·2%), and Internet-related public sentiment data (MAPE reduced by 1·1%, 0·9%, and 1·3%). Interpretation Accurate influenza forecast in areas with irregular seasonal influenza trends can be made by SAAIM with multi-source electronic data.
Collapse
Affiliation(s)
- Kun Su
- Department of Epidemiology, College of Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing, People's Republic of China; Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Liang Xu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Guanqiao Li
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Xiaowen Ruan
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xian Li
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Pan Deng
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xinmi Li
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Qin Li
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Xianxian Chen
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Yu Xiong
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Shaofeng Lu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Li Qi
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Chaobo Shen
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Wenge Tang
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Rong Rong
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Boran Hong
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Yi Ning
- Meinian Institute of Health, Beijing, People's Republic of China
| | - Dongyan Long
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Jiaying Xu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xuanling Shi
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Zhihong Yang
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Qi Zhang
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Ziqi Zhuang
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Linqi Zhang
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China.
| | - Jing Xiao
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China.
| | - Yafei Li
- Department of Epidemiology, College of Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing, People's Republic of China.
| |
Collapse
|
25
|
Masri S, Jia J, Li C, Zhou G, Lee MC, Yan G, Wu J. Use of Twitter data to improve Zika virus surveillance in the United States during the 2016 epidemic. BMC Public Health 2019; 19:761. [PMID: 31200692 PMCID: PMC6570872 DOI: 10.1186/s12889-019-7103-8] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Accepted: 06/04/2019] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND Zika virus (ZIKV) is an emerging mosquito-borne arbovirus that can produce serious public health consequences. In 2016, ZIKV caused an epidemic in many countries around the world, including the United States. ZIKV surveillance and vector control is essential to combating future epidemics. However, challenges relating to the timely publication of case reports significantly limit the effectiveness of current surveillance methods. In many countries with poor infrastructure, established systems for case reporting often do not exist. Previous studies investigating the H1N1 pandemic, general influenza and the recent Ebola outbreak have demonstrated that time- and geo-tagged Twitter data, which is immediately available, can be utilized to overcome these limitations. METHODS In this study, we employed a recently developed system called Cloudberry to filter a random sample of Twitter data to investigate the feasibility of using such data for ZIKV epidemic tracking on a national and state (Florida) level. Two auto-regressive models were calibrated using weekly ZIKV case counts and zika tweets in order to estimate weekly ZIKV cases 1 week in advance. RESULTS While models tended to over-predict at low case counts and under-predict at extreme high counts, a comparison of predicted versus observed weekly ZIKV case counts following model calibration demonstrated overall reasonable predictive accuracy, with an R2 of 0.74 for the Florida model and 0.70 for the U.S. MODEL Time-series analysis of predicted and observed ZIKV cases following internal cross-validation exhibited very similar patterns, demonstrating reasonable model performance. Spatially, the distribution of cumulative ZIKV case counts (local- & travel-related) and zika tweets across all 50 U.S. states showed a high correlation (r = 0.73) after adjusting for population. CONCLUSIONS This study demonstrates the value of utilizing Twitter data for the purposes of disease surveillance. This is of high value to epidemiologist and public health officials charged with protecting the public during future outbreaks.
Collapse
Affiliation(s)
- Shahir Masri
- Program in Public Health, College of Health Sciences, Uniersity of California, Irvine, California, USA
| | - Jianfeng Jia
- Department of Computer Science, University of California, Irvine, California, USA
| | - Chen Li
- Department of Computer Science, University of California, Irvine, California, USA
| | - Guofa Zhou
- Program in Public Health, College of Health Sciences, Uniersity of California, Irvine, California, USA
| | - Ming-Chieh Lee
- Program in Public Health, College of Health Sciences, Uniersity of California, Irvine, California, USA
| | - Guiyun Yan
- Program in Public Health, College of Health Sciences, Uniersity of California, Irvine, California, USA
| | - Jun Wu
- Program in Public Health, College of Health Sciences, Uniersity of California, Irvine, California, USA.
| |
Collapse
|
26
|
Yang CY, Chen RJ, Chou WL, Lee YJ, Lo YS. An Integrated Influenza Surveillance Framework Based on National Influenza-Like Illness Incidence and Multiple Hospital Electronic Medical Records for Early Prediction of Influenza Epidemics: Design and Evaluation. J Med Internet Res 2019; 21:e12341. [PMID: 30707099 PMCID: PMC6376337 DOI: 10.2196/12341] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2018] [Revised: 12/18/2018] [Accepted: 01/20/2019] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Influenza is a leading cause of death worldwide and contributes to heavy economic losses to individuals and communities. Therefore, the early prediction of and interventions against influenza epidemics are crucial to reduce mortality and morbidity because of this disease. Similar to other countries, the Taiwan Centers for Disease Control and Prevention (TWCDC) has implemented influenza surveillance and reporting systems, which primarily rely on influenza-like illness (ILI) data reported by health care providers, for the early prediction of influenza epidemics. However, these surveillance and reporting systems show at least a 2-week delay in prediction, indicating the need for improvement. OBJECTIVE We aimed to integrate the TWCDC ILI data with electronic medical records (EMRs) of multiple hospitals in Taiwan. Our ultimate goal was to develop a national influenza trend prediction and reporting tool more accurate and efficient than the current influenza surveillance and reporting systems. METHODS First, the influenza expertise team at Taipei Medical University Health Care System (TMUHcS) identified surveillance variables relevant to the prediction of influenza epidemics. Second, we developed a framework for integrating the EMRs of multiple hospitals with the ILI data from the TWCDC website to proactively provide results of influenza epidemic monitoring to hospital infection control practitioners. Third, using the TWCDC ILI data as the gold standard for influenza reporting, we calculated Pearson correlation coefficients to measure the strength of the linear relationship between TMUHcS EMRs and regional and national TWCDC ILI data for 2 weekly time series datasets. Finally, we used the Moving Epidemic Method analyses to evaluate each surveillance variable for its predictive power for influenza epidemics. RESULTS Using this framework, we collected the EMRs and TWCDC ILI data of the past 3 influenza seasons (October 2014 to September 2017). On the basis of the EMRs of multiple hospitals, 3 surveillance variables, TMUHcS-ILI, TMUHcS-rapid influenza laboratory tests with positive results (RITP), and TMUHcS-influenza medication use (IMU), which reflected patients with ILI, those with positive results from rapid influenza diagnostic tests, and those treated with antiviral drugs, respectively, showed strong correlations with the TWCDC regional and national ILI data (r=.86-.98). The 2 surveillance variables-TMUHcS-RITP and TMUHcS-IMU-showed predictive power for influenza epidemics 3 to 4 weeks before the increase noted in the TWCDC ILI reports. CONCLUSIONS Our framework periodically integrated and compared surveillance data from multiple hospitals and the TWCDC website to maintain a certain prediction quality and proactively provide monitored results. Our results can be extended to other infectious diseases, mitigating the time and effort required for data collection and analysis. Furthermore, this approach may be developed as a cost-effective electronic surveillance tool for the early and accurate prediction of epidemics of influenza and other infectious diseases in densely populated regions and nations.
Collapse
Affiliation(s)
- Cheng-Yi Yang
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan
| | - Ray-Jade Chen
- Department of Surgery, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan.,Taipei Medical University Hospital, Taipei, Taiwan
| | - Wan-Lin Chou
- Taipei Medical University Hospital, Taipei, Taiwan
| | - Yuarn-Jang Lee
- Division of Infectious Disease, Department of Internal Medicine, Taipei Medical University Hospital, Taipei, Taiwan
| | - Yu-Sheng Lo
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
27
|
Kandula S, Shaman J. Near-term forecasts of influenza-like illness: An evaluation of autoregressive time series approaches. Epidemics 2019; 27:41-51. [PMID: 30792135 DOI: 10.1016/j.epidem.2019.01.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 12/15/2018] [Accepted: 01/16/2019] [Indexed: 10/27/2022] Open
Abstract
Seasonal influenza in the United States is estimated to cause 9-35 million illnesses annually, with resultant economic burden amounting to $47-$150 billion. Reliable real-time forecasts of influenza can help public health agencies better manage these outbreaks. Here, we investigate the feasibility of three autoregressive methods for near-term forecasts: an Autoregressive Integrated Moving Average (ARIMA) model with time-varying order; an ARIMA model fit to seasonally adjusted incidence rates (ARIMA-STL); and a feed-forward autoregressive artificial neural network with a single hidden layer (AR-NN). We generated retrospective forecasts for influenza incidence one to four weeks in the future at US National and 10 regions in the US during 5 influenza seasons. We compared the relative accuracy of the point and probabilistic forecasts of the three models with respect to each other and in relation to two large external validation sets that each comprise at least 20 other models. Both the probabilistic and point forecasts of AR-NN were found to be more accurate than those of the other two models overall. An additional sub-analysis found that the three models benefitted considerably from the use of search trends based 'nowcast' as a proxy for surveillance data, and these three models with use of nowcasts were found to be the highest ranked models in both validation datasets. When the nowcasts were withheld, the three models remained competitive relative to models in the validation sets. The difference in accuracy among the three models, and relative to models of the validation sets, was found to be largely statistically significant. Our results suggest that autoregressive models even when not equipped to capture transmission dynamics can provide reasonably accurate near-term forecasts for influenza. Existing support in open-source libraries make them suitable non-naïve baselines for model comparison studies and for operational forecasts in resource constrained settings where more sophisticated methods may not be feasible.
Collapse
Affiliation(s)
- Sasikiran Kandula
- Department of Environmental Health Sciences, Columbia University, New York, NY, United States.
| | - Jeffrey Shaman
- Department of Environmental Health Sciences, Columbia University, New York, NY, United States
| |
Collapse
|
28
|
Lu FS, Hattab MW, Clemente CL, Biggerstaff M, Santillana M. Improved state-level influenza nowcasting in the United States leveraging Internet-based data and network approaches. Nat Commun 2019; 10:147. [PMID: 30635558 PMCID: PMC6329822 DOI: 10.1038/s41467-018-08082-0] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 12/12/2018] [Indexed: 12/01/2022] Open
Abstract
In the presence of health threats, precision public health approaches aim to provide targeted, timely, and population-specific interventions. Accurate surveillance methodologies that can estimate infectious disease activity ahead of official healthcare-based reports, at relevant spatial resolutions, are important for achieving this goal. Here we introduce a methodological framework which dynamically combines two distinct influenza tracking techniques, using an ensemble machine learning approach, to achieve improved state-level influenza activity estimates in the United States. The two predictive techniques behind the ensemble utilize (1) a self-correcting statistical method combining influenza-related Google search frequencies, information from electronic health records, and historical flu trends within each state, and (2) a network-based approach leveraging spatio-temporal synchronicities observed in historical influenza activity across states. The ensemble considerably outperforms each component method in addition to previously proposed state-specific methods for influenza tracking, with higher correlations and lower prediction errors.
Collapse
Affiliation(s)
- Fred S Lu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA.
| | - Mohammad W Hattab
- Wyss Institute for Biologically Inspired Engineering, Harvard Medical School, Boston, MA, 02115, USA
| | | | - Matthew Biggerstaff
- Influenza Division, National Center for Immunization and Respiratory Disease, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA.
- Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
29
|
Lipsitch M, Santillana M. Enhancing Situational Awareness to Prevent Infectious Disease Outbreaks from Becoming Catastrophic. Curr Top Microbiol Immunol 2019; 424:59-74. [DOI: 10.1007/82_2019_172] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
30
|
Poirier C, Lavenu A, Bertaud V, Campillo-Gimenez B, Chazard E, Cuggia M, Bouzillé G. Real Time Influenza Monitoring Using Hospital Big Data in Combination with Machine Learning Methods: Comparison Study. JMIR Public Health Surveill 2018; 4:e11361. [PMID: 30578212 PMCID: PMC6320394 DOI: 10.2196/11361] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 09/10/2018] [Accepted: 09/10/2018] [Indexed: 11/25/2022] Open
Abstract
Background Traditional surveillance systems produce estimates of influenza-like illness (ILI) incidence rates, but with 1- to 3-week delay. Accurate real-time monitoring systems for influenza outbreaks could be useful for making public health decisions. Several studies have investigated the possibility of using internet users’ activity data and different statistical models to predict influenza epidemics in near real time. However, very few studies have investigated hospital big data. Objective Here, we compared internet and electronic health records (EHRs) data and different statistical models to identify the best approach (data type and statistical model) for ILI estimates in real time. Methods We used Google data for internet data and the clinical data warehouse eHOP, which included all EHRs from Rennes University Hospital (France), for hospital data. We compared 3 statistical models—random forest, elastic net, and support vector machine (SVM). Results For national ILI incidence rate, the best correlation was 0.98 and the mean squared error (MSE) was 866 obtained with hospital data and the SVM model. For the Brittany region, the best correlation was 0.923 and MSE was 2364 obtained with hospital data and the SVM model. Conclusions We found that EHR data together with historical epidemiological information (French Sentinelles network) allowed for accurately predicting ILI incidence rates for the entire France as well as for the Brittany region and outperformed the internet data whatever was the statistical model used. Moreover, the performance of the two statistical models, elastic net and SVM, was comparable.
Collapse
Affiliation(s)
- Canelle Poirier
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France.,INSERM, U1099, Rennes, France
| | - Audrey Lavenu
- Centre d'Investigation Clinique de Rennes, Université de Rennes 1, Rennes, France
| | - Valérie Bertaud
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France.,INSERM, U1099, Rennes, France.,Centre Hospitalier Universitaire de Rennes, Centre de Données Cliniques, Rennes, France
| | - Boris Campillo-Gimenez
- INSERM, U1099, Rennes, France.,Comprehensive Cancer Regional Center, Eugene Marquis, Rennes, France
| | - Emmanuel Chazard
- Centre d'Etudes et de Recherche en Informatique Médicale EA2694, Université de Lille, Lille, France.,Public Health Department, Centre Hospitalier Régional Universitaire de Lille, Lille, France
| | - Marc Cuggia
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France.,INSERM, U1099, Rennes, France.,Centre Hospitalier Universitaire de Rennes, Centre de Données Cliniques, Rennes, France
| | - Guillaume Bouzillé
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France.,INSERM, U1099, Rennes, France.,Centre Hospitalier Universitaire de Rennes, Centre de Données Cliniques, Rennes, France
| |
Collapse
|
31
|
Vilar S, Friedman C, Hripcsak G. Detection of drug-drug interactions through data mining studies using clinical sources, scientific literature and social media. Brief Bioinform 2018; 19:863-877. [PMID: 28334070 PMCID: PMC6454455 DOI: 10.1093/bib/bbx010] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Revised: 12/28/2016] [Indexed: 11/13/2022] Open
Abstract
Drug-drug interactions (DDIs) constitute an important concern in drug development and postmarketing pharmacovigilance. They are considered the cause of many adverse drug effects exposing patients to higher risks and increasing public health system costs. Methods to follow-up and discover possible DDIs causing harm to the population are a primary aim of drug safety researchers. Here, we review different methodologies and recent advances using data mining to detect DDIs with impact on patients. We focus on data mining of different pharmacovigilance sources, such as the US Food and Drug Administration Adverse Event Reporting System and electronic health records from medical institutions, as well as on the diverse data mining studies that use narrative text available in the scientific biomedical literature and social media. We pay attention to the strengths but also further explain challenges related to these methods. Data mining has important applications in the analysis of DDIs showing the impact of the interactions as a cause of adverse effects, extracting interactions to create knowledge data sets and gold standards and in the discovery of novel and dangerous DDIs.
Collapse
Affiliation(s)
- Santiago Vilar
- Department of Biomedical Informatics, Columbia University, New York, USA
- Department of Organic Chemistry, University of Santiago de Compostela, Spain
| | - Carol Friedman
- Department of Biomedical Informatics, Columbia University, New York, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, USA
| |
Collapse
|
32
|
Baltrusaitis K, Brownstein JS, Scarpino SV, Bakota E, Crawley AW, Conidi G, Gunn J, Gray J, Zink A, Santillana M. Comparison of crowd-sourced, electronic health records based, and traditional health-care based influenza-tracking systems at multiple spatial resolutions in the United States of America. BMC Infect Dis 2018; 18:403. [PMID: 30111305 PMCID: PMC6094455 DOI: 10.1186/s12879-018-3322-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 08/09/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Influenza causes an estimated 3000 to 50,000 deaths per year in the United States of America (US). Timely and representative data can help local, state, and national public health officials monitor and respond to outbreaks of seasonal influenza. Data from cloud-based electronic health records (EHR) and crowd-sourced influenza surveillance systems have the potential to provide complementary, near real-time estimates of influenza activity. The objectives of this paper are to compare two novel influenza-tracking systems with three traditional healthcare-based influenza surveillance systems at four spatial resolutions: national, regional, state, and city, and to determine the minimum number of participants in these systems required to produce influenza activity estimates that resemble the historical trends recorded by traditional surveillance systems. METHODS We compared influenza activity estimates from five influenza surveillance systems: 1) patient visits for influenza-like illness (ILI) from the US Outpatient ILI Surveillance Network (ILINet), 2) virologic data from World Health Organization (WHO) Collaborating and National Respiratory and Enteric Virus Surveillance System (NREVSS) Laboratories, 3) Emergency Department (ED) syndromic surveillance from Boston, Massachusetts, 4) patient visits for ILI from EHR, and 5) reports of ILI from the crowd-sourced system, Flu Near You (FNY), by calculating correlations between these systems across four influenza seasons, 2012-16, at four different spatial resolutions in the US. For the crowd-sourced system, we also used a bootstrapping statistical approach to estimate the minimum number of reports necessary to produce a meaningful signal at a given spatial resolution. RESULTS In general, as the spatial resolution increased, correlation values between all influenza surveillance systems decreased. Influenza-like Illness rates in geographic areas with more than 250 crowd-sourced participants or with more than 20,000 visit counts for EHR tracked government-lead estimates of influenza activity. CONCLUSIONS With a sufficient number of reports, data from novel influenza surveillance systems can complement traditional healthcare-based systems at multiple spatial resolutions.
Collapse
Affiliation(s)
- Kristin Baltrusaitis
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02115 USA
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - John S. Brownstein
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02115 USA
- Harvard Medical School, Boston, MA 02115 USA
| | - Samuel V. Scarpino
- Department of Mathematics and Statistics, University of Vermont, Vermont, USA
| | - Eric Bakota
- City of Houston Health Department, Houston, TX 77054 USA
| | | | | | - Julia Gunn
- Boston Public Health Commission, Boston, MA USA
| | - Josh Gray
- athenaResearch at athenahealth, Watertown, MA USA
| | - Anna Zink
- athenaResearch at athenahealth, Watertown, MA USA
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02115 USA
- Harvard Medical School, Boston, MA 02115 USA
| |
Collapse
|
33
|
Yoon J, Kim JW, Jang B. DiTeX: Disease-related topic extraction system through internet-based sources. PLoS One 2018; 13:e0201933. [PMID: 30075009 PMCID: PMC6075781 DOI: 10.1371/journal.pone.0201933] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 07/24/2018] [Indexed: 11/19/2022] Open
Abstract
This paper describes the web-based automated disease-related topic extraction system, called to DiTeX, which monitors important disease-related topics and provides associated information. National disease surveillance systems require a considerable amount of time to inform people of recent outbreaks of diseases. To solve this problem, many studies have used Internet-based sources such as news and Social Network Service (SNS). However, these sources contain many intentional elements that disturb extracting important topics. To address this challenge, we employ Natural Language Processing and an effective ranking algorithm, and develop DiTeX that provides important disease-related topics. This report describes the web front-end and back-end architecture, implementation, performance of the ranking algorithm, and captured topics of DiTeX. We describe processes for collecting Internet-based data and extracting disease-related topics based on search keywords. Our system then applies a ranking algorithm to evaluate the importance of disease-related topics extracted from these data. Finally, we conduct analysis based on real-world incidents to evaluate the performance and the effectiveness of DiTeX. To evaluate DiTeX, we analyze the ranking of well-known disease-related incidents for various ranking algorithms. The topic extraction rate of our ranking algorithm is superior to those of others. We demonstrate the validity of DiTeX by summarizing the disease-related topics of each day extracted by our system. To our knowledge, DiTeX is the world’s first automated web-based real-time service system that extracts and presents disease-related topics, trends and related data through web-based sources. DiTeX is now available on the web through http://epidemic.co.kr/media/topics.
Collapse
Affiliation(s)
- Jungwon Yoon
- Department of Computer Science, Sangmyung University, Seoul, South Korea
| | - Jong Wook Kim
- Department of Computer Science, Sangmyung University, Seoul, South Korea
| | - Beakcheol Jang
- Department of Computer Science, Sangmyung University, Seoul, South Korea
- * E-mail:
| |
Collapse
|
34
|
Kandula S, Yamana T, Pei S, Yang W, Morita H, Shaman J. Evaluation of mechanistic and statistical methods in forecasting influenza-like illness. J R Soc Interface 2018; 15:20180174. [PMID: 30045889 PMCID: PMC6073642 DOI: 10.1098/rsif.2018.0174] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 07/02/2018] [Indexed: 11/25/2022] Open
Abstract
A variety of mechanistic and statistical methods to forecast seasonal influenza have been proposed and are in use; however, the effects of various data issues and design choices (statistical versus mechanistic methods, for example) on the accuracy of these approaches have not been thoroughly assessed. Here, we compare the accuracy of three forecasting approaches-a mechanistic method, a weighted average of two statistical methods and a super-ensemble of eight statistical and mechanistic models-in predicting seven outbreak characteristics of seasonal influenza during the 2016-2017 season at the national and 10 regional levels in the USA. For each of these approaches, we report the effects of real time under- and over-reporting in surveillance systems, use of non-surveillance proxies of influenza activity and manual override of model predictions on forecast quality. Our results suggest that a meta-ensemble of statistical and mechanistic methods has better overall accuracy than the individual methods. Supplementing surveillance data with proxy estimates generally improves the quality of forecasts and transient reporting errors degrade the performance of all three approaches considerably. The improvement in quality from ad hoc and post-forecast changes suggests that domain experts continue to possess information that is not being sufficiently captured by current forecasting approaches.
Collapse
Affiliation(s)
- Sasikiran Kandula
- Department of Environmental Health Sciences, Columbia University, New York, NY, USA
| | - Teresa Yamana
- Department of Environmental Health Sciences, Columbia University, New York, NY, USA
| | - Sen Pei
- Department of Environmental Health Sciences, Columbia University, New York, NY, USA
| | - Wan Yang
- Department of Environmental Health Sciences, Columbia University, New York, NY, USA
| | - Haruka Morita
- Department of Environmental Health Sciences, Columbia University, New York, NY, USA
| | - Jeffrey Shaman
- Department of Environmental Health Sciences, Columbia University, New York, NY, USA
| |
Collapse
|
35
|
Hu H, Wang H, Wang F, Langley D, Avram A, Liu M. Prediction of influenza-like illness based on the improved artificial tree algorithm and artificial neural network. Sci Rep 2018; 8:4895. [PMID: 29559649 PMCID: PMC5861130 DOI: 10.1038/s41598-018-23075-1] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 03/05/2018] [Indexed: 12/24/2022] Open
Abstract
Because influenza is a contagious respiratory illness that seriously threatens public health, accurate real-time prediction of influenza outbreaks may help save lives. In this paper, we use the Twitter data set and the United States Centers for Disease Control’s influenza-like illness (ILI) data set to predict a nearly real-time regional unweighted percentage ILI in the United States by use of an artificial neural network optimized by the improved artificial tree algorithm. The results show that the proposed method is an efficient approach to real-time prediction.
Collapse
Affiliation(s)
- Hongping Hu
- School of Science, North University of China, Taiyuan, Shanxi, 030051, PR China.
| | - Haiyan Wang
- School of Mathematical and Natural Sciences, Arizona State University, Phoenix, Arizona, USA
| | - Feng Wang
- School of Mathematical and Natural Sciences, Arizona State University, Phoenix, Arizona, USA
| | - Daniel Langley
- School of Mathematical and Natural Sciences, Arizona State University, Phoenix, Arizona, USA
| | - Adrian Avram
- School of Mathematical and Natural Sciences, Arizona State University, Phoenix, Arizona, USA
| | - Maoxing Liu
- School of Science, North University of China, Taiyuan, Shanxi, 030051, PR China
| |
Collapse
|
36
|
Lee EC, Arab A, Goldlust SM, Viboud C, Grenfell BT, Bansal S. Deploying digital health data to optimize influenza surveillance at national and local scales. PLoS Comput Biol 2018. [PMID: 29513661 PMCID: PMC5858836 DOI: 10.1371/journal.pcbi.1006020] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The surveillance of influenza activity is critical to early detection of epidemics and pandemics and the design of disease control strategies. Case reporting through a voluntary network of sentinel physicians is a commonly used method of passive surveillance for monitoring rates of influenza-like illness (ILI) worldwide. Despite its ubiquity, little attention has been given to the processes underlying the observation, collection, and spatial aggregation of sentinel surveillance data, and its subsequent effects on epidemiological understanding. We harnessed the high specificity of diagnosis codes in medical claims from a database that represented 2.5 billion visits from upwards of 120,000 United States healthcare providers each year. Among influenza seasons from 2002-2009 and the 2009 pandemic, we simulated limitations of sentinel surveillance systems such as low coverage and coarse spatial resolution, and performed Bayesian inference to probe the robustness of ecological inference and spatial prediction of disease burden. Our models suggest that a number of socio-environmental factors, in addition to local population interactions, state-specific health policies, as well as sampling effort may be responsible for the spatial patterns in U.S. sentinel ILI surveillance. In addition, we find that biases related to spatial aggregation were accentuated among areas with more heterogeneous disease risk, and sentinel systems designed with fixed reporting locations across seasons provided robust inference and prediction. With the growing availability of health-associated big data worldwide, our results suggest mechanisms for optimizing digital data streams to complement traditional surveillance in developed settings and enhance surveillance opportunities in developing countries. Influenza contributes substantially to global morbidity and mortality each year, and epidemiological surveillance for influenza is typically conducted by sentinel physicians and health care providers recruited to report cases of influenza-like illness. While population coverage and representativeness, and geographic distribution are considered during sentinel provider recruitment, systems cannot always achieve these standards due to the administrative burdens of data collection. We present spatial estimates of influenza disease burden across United States counties by leveraging the volume and fine spatial resolution of medical claims data, and existing socio-environmental hypotheses about the determinants of influenza disease disease burden. Using medical claims as a testbed, this study adds to literature on the optimization of surveillance system design by considering conditions of limited reporting and spatial aggregation. We highlight the importance of considering sampling biases and reporting locations when interpreting surveillance data, and suggest that local mobility and regional policies may be critical to understanding the spatial distribution of reported influenza-like illness.
Collapse
Affiliation(s)
- Elizabeth C. Lee
- Department of Biology, Georgetown University, Washington, DC, United States of America
- * E-mail: (ECL); (SB)
| | - Ali Arab
- Department of Mathematics & Statistics, Georgetown University, Washington, DC, United States of America
| | - Sandra M. Goldlust
- Department of Biology, Georgetown University, Washington, DC, United States of America
| | - Cécile Viboud
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Bryan T. Grenfell
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
- Department of Ecology & Evolutionary Biology and Woodrow Wilson School, Princeton University, Princeton, New Jersey, United States of America
| | - Shweta Bansal
- Department of Biology, Georgetown University, Washington, DC, United States of America
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (ECL); (SB)
| |
Collapse
|
37
|
Dolley S. Big Data's Role in Precision Public Health. Front Public Health 2018; 6:68. [PMID: 29594091 PMCID: PMC5859342 DOI: 10.3389/fpubh.2018.00068] [Citation(s) in RCA: 100] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 02/20/2018] [Indexed: 01/01/2023] Open
Abstract
Precision public health is an emerging practice to more granularly predict and understand public health risks and customize treatments for more specific and homogeneous subpopulations, often using new data, technologies, and methods. Big data is one element that has consistently helped to achieve these goals, through its ability to deliver to practitioners a volume and variety of structured or unstructured data not previously possible. Big data has enabled more widespread and specific research and trials of stratifying and segmenting populations at risk for a variety of health problems. Examples of success using big data are surveyed in surveillance and signal detection, predicting future risk, targeted interventions, and understanding disease. Using novel big data or big data approaches has risks that remain to be resolved. The continued growth in volume and variety of available data, decreased costs of data capture, and emerging computational methods mean big data success will likely be a required pillar of precision public health into the future. This review article aims to identify the precision public health use cases where big data has added value, identify classes of value that big data may bring, and outline the risks inherent in using big data in precision public health efforts.
Collapse
|
38
|
Lu FS, Hou S, Baltrusaitis K, Shah M, Leskovec J, Sosic R, Hawkins J, Brownstein J, Conidi G, Gunn J, Gray J, Zink A, Santillana M. Accurate Influenza Monitoring and Forecasting Using Novel Internet Data Streams: A Case Study in the Boston Metropolis. JMIR Public Health Surveill 2018; 4:e4. [PMID: 29317382 PMCID: PMC5780615 DOI: 10.2196/publichealth.8950] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Revised: 11/08/2017] [Accepted: 11/12/2017] [Indexed: 11/30/2022] Open
Abstract
Background Influenza outbreaks pose major challenges to public health around the world, leading to thousands of deaths a year in the United States alone. Accurate systems that track influenza activity at the city level are necessary to provide actionable information that can be used for clinical, hospital, and community outbreak preparation. Objective Although Internet-based real-time data sources such as Google searches and tweets have been successfully used to produce influenza activity estimates ahead of traditional health care–based systems at national and state levels, influenza tracking and forecasting at finer spatial resolutions, such as the city level, remain an open question. Our study aimed to present a precise, near real-time methodology capable of producing influenza estimates ahead of those collected and published by the Boston Public Health Commission (BPHC) for the Boston metropolitan area. This approach has great potential to be extended to other cities with access to similar data sources. Methods We first tested the ability of Google searches, Twitter posts, electronic health records, and a crowd-sourced influenza reporting system to detect influenza activity in the Boston metropolis separately. We then adapted a multivariate dynamic regression method named ARGO (autoregression with general online information), designed for tracking influenza at the national level, and showed that it effectively uses the above data sources to monitor and forecast influenza at the city level 1 week ahead of the current date. Finally, we presented an ensemble-based approach capable of combining information from models based on multiple data sources to more robustly nowcast as well as forecast influenza activity in the Boston metropolitan area. The performances of our models were evaluated in an out-of-sample fashion over 4 influenza seasons within 2012-2016, as well as a holdout validation period from 2016 to 2017. Results Our ensemble-based methods incorporating information from diverse models based on multiple data sources, including ARGO, produced the most robust and accurate results. The observed Pearson correlations between our out-of-sample flu activity estimates and those historically reported by the BPHC were 0.98 in nowcasting influenza and 0.94 in forecasting influenza 1 week ahead of the current date. Conclusions We show that information from Internet-based data sources, when combined using an informed, robust methodology, can be effectively used as early indicators of influenza activity at fine geographic resolutions.
Collapse
Affiliation(s)
- Fred Sun Lu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
| | - Suqin Hou
- Harvard Chan School of Public Health, Harvard University, Boston, MA, United States
| | - Kristin Baltrusaitis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States
| | - Manan Shah
- Computer Science Department, Stanford University, Stanford, CA, United States
| | - Jure Leskovec
- Computer Science Department, Stanford University, Stanford, CA, United States.,Chan Zuckerberg Biohub, San Francisco, CA, United States
| | - Rok Sosic
- Computer Science Department, Stanford University, Stanford, CA, United States
| | - Jared Hawkins
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| | - John Brownstein
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| | | | - Julia Gunn
- Boston Public Health Commission, Boston, MA, United States
| | - Josh Gray
- athenaResearch, athenahealth, Watertown, MA, United States
| | - Anna Zink
- athenaResearch, athenahealth, Watertown, MA, United States
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
39
|
Brownstein JS, Chu S, Marathe A, Marathe MV, Nguyen AT, Paolotti D, Perra N, Perrotta D, Santillana M, Swarup S, Tizzoni M, Vespignani A, Vullikanti AKS, Wilson ML, Zhang Q. Combining Participatory Influenza Surveillance with Modeling and Forecasting: Three Alternative Approaches. JMIR Public Health Surveill 2017; 3:e83. [PMID: 29092812 PMCID: PMC5688248 DOI: 10.2196/publichealth.7344] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Revised: 04/06/2017] [Accepted: 10/09/2017] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Influenza outbreaks affect millions of people every year and its surveillance is usually carried out in developed countries through a network of sentinel doctors who report the weekly number of Influenza-like Illness cases observed among the visited patients. Monitoring and forecasting the evolution of these outbreaks supports decision makers in designing effective interventions and allocating resources to mitigate their impact. OBJECTIVE Describe the existing participatory surveillance approaches that have been used for modeling and forecasting of the seasonal influenza epidemic, and how they can help strengthen real-time epidemic science and provide a more rigorous understanding of epidemic conditions. METHODS We describe three different participatory surveillance systems, WISDM (Widely Internet Sourced Distributed Monitoring), Influenzanet and Flu Near You (FNY), and show how modeling and simulation can be or has been combined with participatory disease surveillance to: i) measure the non-response bias in a participatory surveillance sample using WISDM; and ii) nowcast and forecast influenza activity in different parts of the world (using Influenzanet and Flu Near You). RESULTS WISDM-based results measure the participatory and sample bias for three epidemic metrics i.e. attack rate, peak infection rate, and time-to-peak, and find the participatory bias to be the largest component of the total bias. The Influenzanet platform shows that digital participatory surveillance data combined with a realistic data-driven epidemiological model can provide both short-term and long-term forecasts of epidemic intensities, and the ground truth data lie within the 95 percent confidence intervals for most weeks. The statistical accuracy of the ensemble forecasts increase as the season progresses. The Flu Near You platform shows that participatory surveillance data provide accurate short-term flu activity forecasts and influenza activity predictions. The correlation of the HealthMap Flu Trends estimates with the observed CDC ILI rates is 0.99 for 2013-2015. Additional data sources lead to an error reduction of about 40% when compared to the estimates of the model that only incorporates CDC historical information. CONCLUSIONS While the advantages of participatory surveillance, compared to traditional surveillance, include its timeliness, lower costs, and broader reach, it is limited by a lack of control over the characteristics of the population sample. Modeling and simulation can help overcome this limitation as well as provide real-time and long-term forecasting of influenza activity in data-poor parts of the world.
Collapse
Affiliation(s)
- John S Brownstein
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Computational Epidemiology Group, Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, United States.,Harvard Medical School, Boston, MA, United States
| | - Shuyu Chu
- Network Dynamics and Simulation Science Laboratory, Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States
| | - Achla Marathe
- Network Dynamics and Simulation Science Laboratory, Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States
| | - Madhav V Marathe
- Network Dynamics and Simulation Science Laboratory, Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States
| | - Andre T Nguyen
- Computational Epidemiology Group, Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, United States.,Booz Allen Hamilton, Boston, MA, United States
| | - Daniela Paolotti
- Computational Epidemiology Laboratory, Institute for Scientific Interchange, Turin, Italy
| | - Nicola Perra
- Centre for Business Networks Analysis, University of Greenwich, London, United Kingdom
| | - Daniela Perrotta
- Computational Epidemiology Laboratory, Institute for Scientific Interchange, Turin, Italy
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Computational Epidemiology Group, Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, United States.,Harvard Medical School, Boston, MA, United States
| | - Samarth Swarup
- Network Dynamics and Simulation Science Laboratory, Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States
| | - Michele Tizzoni
- Computational Epidemiology Laboratory, Institute for Scientific Interchange, Turin, Italy
| | - Alessandro Vespignani
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, United States
| | - Anil Kumar S Vullikanti
- Network Dynamics and Simulation Science Laboratory, Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States
| | - Mandy L Wilson
- Network Dynamics and Simulation Science Laboratory, Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States
| | - Qian Zhang
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, United States
| |
Collapse
|
40
|
Abstract
Objectives: To summarize current research in the field of Public Health and Epidemiology Informatics. Methods: The complete 2016 literature concerning public health and epidemiology informatics has been searched in PubMed and Web of Science, and the returned references were reviewed by the two section editors to select 14 candidate best papers. These papers were then peer-reviewed by external reviewers to allow the editorial team an enlightened selection of the best papers. Results: Among the 829 references retrieved from PubMed and Web of Science, three were finally selected as best papers. The first one compares Google, Twitter, and Wikipedia as tools for Influenza surveillance. The second paper presents a Geographic Knowledge-Based Model for mapping suitable areas for Rift Valley fever transmission in Eastern Africa. The last paper evaluates the factors associated with the visit of Facebook pages devoted to Public Health Communication. Conclusions: Surveillance is still a productive topic in public health informatics but other very important topics in public health are appearing.
Collapse
|
41
|
Petersen J, Simons H, Patel D, Freedman J. Early detection of perceived risk among users of a UK travel health website compared with internet search activity and media coverage during the 2015-2016 Zika virus outbreak: an observational study. BMJ Open 2017; 7:e015831. [PMID: 28860226 PMCID: PMC5589019 DOI: 10.1136/bmjopen-2017-015831] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVES The Zika virus (ZIKV) outbreak in the Americas in 2015-2016 posed a novel global threat due to the association with congenital malformations and its rapid spread. Timely information about the spread of the disease was paramount to public health bodies issuing travel advisories. This paper looks at the online interaction with a national travel health website during the outbreak and compares this to trends in internet searches and news media output. METHODS Time trends were created for weekly views of ZIKV-related pages on a UK travel health website, relative search volumes for 'Zika' on Google UK, ZIKV-related items aggregated by Google UK News and rank of ZIKV travel advisories among all other pages between 15 November 2015 and 20 August 2016. RESULTS Time trends in traffic to the travel health website corresponded with Google searches, but less so with media items due to intense coverage of the Rio Olympics. Travel advisories for pregnant women were issued from 7 December 2015 and began to increase in popularity (rank) from early January 2016, weeks before a surge in interest as measured by Google searches/news items at the end of January 2016. CONCLUSIONS The study showed an amplification of perceived risk among users of a national travel health website weeks before the initial surge in public interest. This suggests a potential value for tools to detect changes in online information seeking behaviours for predicting periods of high demand where the routine capability of travel health services could be exceeded.
Collapse
Affiliation(s)
- Jakob Petersen
- National Travel Health Network and Centre, University College London Hospital NHS Foundation Trust, London, UK
| | - Hilary Simons
- National Travel Health Network and Centre, Liverpool School of Tropical Medicine, Liverpool, UK
| | - Dipti Patel
- National Travel Health Network and Centre, University College London Hospital NHS Foundation Trust, London, UK
| | - Joanne Freedman
- Travel and Migrant Health Section, Public Health England, London, UK
| |
Collapse
|
42
|
Yang S, Santillana M, Brownstein JS, Gray J, Richardson S, Kou SC. Using electronic health records and Internet search information for accurate influenza forecasting. BMC Infect Dis 2017; 17:332. [PMID: 28482810 PMCID: PMC5423019 DOI: 10.1186/s12879-017-2424-7] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Accepted: 04/26/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate influenza activity forecasting helps public health officials prepare and allocate resources for unusual influenza activity. Traditional flu surveillance systems, such as the Centers for Disease Control and Prevention's (CDC) influenza-like illnesses reports, lag behind real-time by one to 2 weeks, whereas information contained in cloud-based electronic health records (EHR) and in Internet users' search activity is typically available in near real-time. We present a method that combines the information from these two data sources with historical flu activity to produce national flu forecasts for the United States up to 4 weeks ahead of the publication of CDC's flu reports. METHODS We extend a method originally designed to track flu using Google searches, named ARGO, to combine information from EHR and Internet searches with historical flu activities. Our regularized multivariate regression model dynamically selects the most appropriate variables for flu prediction every week. The model is assessed for the flu seasons within the time period 2013-2016 using multiple metrics including root mean squared error (RMSE). RESULTS Our method reduces the RMSE of the publicly available alternative (Healthmap flutrends) method by 33, 20, 17 and 21%, for the four time horizons: real-time, one, two, and 3 weeks ahead, respectively. Such accuracy improvements are statistically significant at the 5% level. Our real-time estimates correctly identified the peak timing and magnitude of the studied flu seasons. CONCLUSIONS Our method significantly reduces the prediction error when compared to historical publicly available Internet-based prediction systems, demonstrating that: (1) the method to combine data sources is as important as data quality; (2) effectively extracting information from a cloud-based EHR and Internet search activity leads to accurate forecast of flu.
Collapse
Affiliation(s)
- Shihao Yang
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, MA, 02138, USA
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02215, USA.
- Harvard Medical School, Boston, MA, 02115, USA.
| | - John S Brownstein
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02215, USA
- Harvard Medical School, Boston, MA, 02115, USA
| | - Josh Gray
- AthenaResearch at athenahealth, Watertown, MA, 02472, USA
| | | | - S C Kou
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, MA, 02138, USA.
| |
Collapse
|
43
|
McGough SF, Brownstein JS, Hawkins JB, Santillana M. Forecasting Zika Incidence in the 2016 Latin America Outbreak Combining Traditional Disease Surveillance with Search, Social Media, and News Report Data. PLoS Negl Trop Dis 2017; 11:e0005295. [PMID: 28085877 PMCID: PMC5268704 DOI: 10.1371/journal.pntd.0005295] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 01/26/2017] [Accepted: 01/02/2017] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Over 400,000 people across the Americas are thought to have been infected with Zika virus as a consequence of the 2015-2016 Latin American outbreak. Official government-led case count data in Latin America are typically delayed by several weeks, making it difficult to track the disease in a timely manner. Thus, timely disease tracking systems are needed to design and assess interventions to mitigate disease transmission. METHODOLOGY/PRINCIPAL FINDINGS We combined information from Zika-related Google searches, Twitter microblogs, and the HealthMap digital surveillance system with historical Zika suspected case counts to track and predict estimates of suspected weekly Zika cases during the 2015-2016 Latin American outbreak, up to three weeks ahead of the publication of official case data. We evaluated the predictive power of these data and used a dynamic multivariable approach to retrospectively produce predictions of weekly suspected cases for five countries: Colombia, El Salvador, Honduras, Venezuela, and Martinique. Models that combined Google (and Twitter data where available) with autoregressive information showed the best out-of-sample predictive accuracy for 1-week ahead predictions, whereas models that used only Google and Twitter typically performed best for 2- and 3-week ahead predictions. SIGNIFICANCE Given the significant delay in the release of official government-reported Zika case counts, we show that these Internet-based data streams can be used as timely and complementary ways to assess the dynamics of the outbreak.
Collapse
Affiliation(s)
- Sarah F. McGough
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Computational Epidemiology Group, Division of Emergency Medicine, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- * E-mail: (SFM); (MS)
| | - John S. Brownstein
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Computational Epidemiology Group, Division of Emergency Medicine, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Jared B. Hawkins
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Computational Epidemiology Group, Division of Emergency Medicine, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Computational Epidemiology Group, Division of Emergency Medicine, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail: (SFM); (MS)
| |
Collapse
|
44
|
Twitter Influenza Surveillance: Quantifying Seasonal Misdiagnosis Patterns and their Impact on Surveillance Estimates. Online J Public Health Inform 2016; 8:e198. [PMID: 28210419 PMCID: PMC5302465 DOI: 10.5210/ojphi.v8i3.7011] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Influenza (flu) surveillance using Twitter data can potentially save lives and increase efficiency by providing governments and healthcare organizations with greater situational awareness. However, research is needed to determine the impact of Twitter users' misdiagnoses on surveillance estimates. OBJECTIVE This study establishes the importance of Twitter users' misdiagnoses by showing that Twitter flu surveillance in the United States failed during the 2011-2012 flu season, estimates the extent of misdiagnoses, and tests several methods for reducing the adverse effects of misdiagnoses. METHODS Metrics representing flu prevalence, seasonal misdiagnosis patterns, diagnosis uncertainty, flu symptoms, and noise were produced using Twitter data in conjunction with OpenSextant for geo-inferencing, and a maximum entropy classifier for identifying tweets related to illness. These metrics were tested for correlations with World Health Organization (WHO) positive specimen counts of flu from 2011 to 2014. RESULTS Twitter flu surveillance erroneously indicated a typical flu season during 2011-2012, even though the flu season peaked three months late, and erroneously indicated plateaus of flu tweets before the 2012-2013 and 2013-2014 flu seasons. Enhancements based on estimates of misdiagnoses removed the erroneous plateaus and increased the Pearson correlation coefficients by .04 and .23, but failed to correct the 2011-2012 flu season estimate. A rough estimate indicates that approximately 40% of flu tweets reflected misdiagnoses. CONCLUSIONS Further research into factors affecting Twitter users' misdiagnoses, in conjunction with data from additional atypical flu seasons, is needed to enable Twitter flu surveillance systems to produce reliable estimates during atypical flu seasons.
Collapse
|