Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Osthus D, Daughton AR, Priedhorsky R. Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited. PLoS Comput Biol 2019;15:e1006599. [PMID: 30707689 PMCID: PMC6373968 DOI: 10.1371/journal.pcbi.1006599] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 02/13/2019] [Accepted: 10/30/2018] [Indexed: 11/19/2022] Open

For:	Osthus D, Daughton AR, Priedhorsky R. Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited. PLoS Comput Biol 2019;15:e1006599. [PMID: 30707689 PMCID: PMC6373968 DOI: 10.1371/journal.pcbi.1006599] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 02/13/2019] [Accepted: 10/30/2018] [Indexed: 11/19/2022] Open

Number

Cited by Other Article(s)

Ray EL, Wang Y, Wolfinger RD, Reich NG. Flusion: Integrating multiple data sources for accurate influenza predictions. Epidemics 2025;50:100810. [PMID: 39818098 DOI: 10.1016/j.epidem.2024.100810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 09/26/2024] [Accepted: 12/06/2024] [Indexed: 01/18/2025] Open

Abstract

Over the last ten years, the US Centers for Disease Control and Prevention (CDC) has organized an annual influenza forecasting challenge with the motivation that accurate probabilistic forecasts could improve situational awareness and yield more effective public health actions. Starting with the 2021/22 influenza season, the forecasting targets for this challenge have been based on hospital admissions reported in the CDC's National Healthcare Safety Network (NHSN) surveillance system. Reporting of influenza hospital admissions through NHSN began within the last few years, and as such only a limited amount of historical data are available for this target signal. To produce forecasts in the presence of limited data for the target surveillance system, we augmented these data with two signals that have a longer historical record: 1) ILI+, which estimates the proportion of outpatient doctor visits where the patient has influenza; and 2) rates of laboratory-confirmed influenza hospitalizations at a selected set of healthcare facilities. Our model, Flusion, is an ensemble model that combines two machine learning models using gradient boosting for quantile regression based on different feature sets with a Bayesian autoregressive model. The gradient boosting models were trained on all three data signals, while the autoregressive model was trained on only data for the target surveillance signal, NHSN admissions; all three models were trained jointly on data for multiple locations. In each week of the influenza season, these models produced quantiles of a predictive distribution of influenza hospital admissions in each state for the current week and the following three weeks; the ensemble prediction was computed by averaging these quantile predictions. Flusion emerged as the top-performing model in the CDC's influenza prediction challenge for the 2023/24 season. In this article we investigate the factors contributing to Flusion's success, and we find that its strong performance was primarily driven by the use of a gradient boosting model that was trained jointly on data from multiple surveillance signals and multiple locations. These results indicate the value of sharing information across multiple locations and surveillance signals, especially when doing so adds to the pool of available training data.

Collapse

Kim M, Kim Y, Nah K. Predicting seasonal influenza outbreaks with regime shift-informed dynamics for improved public health preparedness. Sci Rep 2024;14:12698. [PMID: 38830955 PMCID: PMC11148101 DOI: 10.1038/s41598-024-63573-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 05/30/2024] [Indexed: 06/05/2024] Open

Al Hossain F, Tonmoy MTH, Nuvvula S, Chapman BP, Gupta RK, Lover AA, Dinglasan RR, Carreiro S, Rahman T. Syndromic surveillance of population-level COVID-19 burden with cough monitoring in a hospital emergency waiting room. Front Public Health 2024;12:1279392. [PMID: 38605877 PMCID: PMC11007176 DOI: 10.3389/fpubh.2024.1279392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 03/11/2024] [Indexed: 04/13/2024] Open

Abstract

Syndromic surveillance is an effective tool for enabling the timely detection of infectious disease outbreaks and facilitating the implementation of effective mitigation strategies by public health authorities. While various information sources are currently utilized to collect syndromic signal data for analysis, the aggregated measurement of cough, an important symptom for many illnesses, is not widely employed as a syndromic signal. With recent advancements in ubiquitous sensing technologies, it becomes feasible to continuously measure population-level cough incidence in a contactless, unobtrusive, and automated manner. In this work, we demonstrate the utility of monitoring aggregated cough count as a syndromic indicator to estimate COVID-19 cases. In our study, we deployed a sensor-based platform (Syndromic Logger) in the emergency room of a large hospital. The platform captured syndromic signals from audio, thermal imaging, and radar, while the ground truth data were collected from the hospital's electronic health record. Our analysis revealed a significant correlation between the aggregated cough count and positive COVID-19 cases in the hospital (Pearson correlation of 0.40, p-value < 0.001). Notably, this correlation was higher than that observed with the number of individuals presenting with fever (ρ = 0.22, p = 0.04), a widely used syndromic signal and screening tool for such diseases. Furthermore, we demonstrate how the data obtained from our Syndromic Logger platform could be leveraged to estimate various COVID-19-related statistics using multiple modeling approaches. Aggregated cough counts and other data, such as people density collected from our platform, can be utilized to predict COVID-19 patient visits related metrics in a hospital waiting room, and SHAP and Gini feature importance-based metrics showed cough count as the important feature for these prediction models. Furthermore, we have shown that predictions based on cough counting outperform models based on fever detection (e.g., temperatures over 39°C), which require more intrusive engagement with the population. Our findings highlight that incorporating cough-counting based signals into syndromic surveillance systems can significantly enhance overall resilience against future public health challenges, such as emerging disease outbreaks or pandemics.

Collapse

Reich NG, Wang Y, Burns M, Ergas R, Cramer EY, Ray EL. Assessing the utility of COVID-19 case reports as a leading indicator for hospitalization forecasting in the United States. Epidemics 2023;45:100728. [PMID: 37976681 PMCID: PMC10871058 DOI: 10.1016/j.epidem.2023.100728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 09/29/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023] Open

Al Hossain F, Tonmoy TH, Nuvvula S, Chapman BP, Gupta RK, Lover AA, Dinglasan RR, Carreiro S, Rahman T. Passive Monitoring of Crowd-Level Cough Counts in Waiting Areas produces Reliable Syndromic Indicator for Total COVID-19 Burden in a Hospital Emergency Clinic. RESEARCH SQUARE 2023:rs.3.rs-3084318. [PMID: 37461489 PMCID: PMC10350162 DOI: 10.21203/rs.3.rs-3084318/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]

Abstract

Syndromic surveillance is an effective tool for enabling the timely detection of infectious disease outbreaks and facilitating the implementation of effective mitigation strategies by public health authorities. While various information sources are currently utilized to collect syndromic signal data for analysis, the aggregated measurement of cough, an important symptom for many illnesses, is not widely employed as a syndromic signal. With recent advancements in ubiquitous sensing technologies, it becomes feasible to continuously measure population-level cough incidence in a contactless, unobtrusive, and automated manner. In this work, we demonstrate the utility of monitoring aggregated cough count as a syndromic indicator to estimate COVID-19 cases. In our study, we deployed a sensor-based platform (Syndromic Logger) in the emergency room of a large hospital. The platform captured syndromic signals from audio, thermal imaging, and radar, while the ground truth data were collected from the hospital's electronic health record. Our analysis revealed a significant correlation between the aggregated cough count and positive COVID-19 cases in the hospital (Pearson correlation of 0.40, p-value < 0.001). Notably, this correlation was higher than that observed with the number of individuals presenting with fever (ρ = 0.22, p = 0.04), a widely used syndromic signal and screening tool for such diseases. Furthermore, we demonstrate how the data obtained from our Syndromic Logger platform could be leveraged to estimate various COVID-19-related statistics using multiple modeling approaches. Our findings highlight the efficacy of aggregated cough count as a valuable syndromic indicator associated with the occurrence of COVID-19 cases. Incorporating this signal into syndromic surveillance systems for such diseases can significantly enhance overall resilience against future public health challenges, such as emerging disease outbreaks or pandemics.

Collapse

Beesley LJ, Osthus D, Del Valle SY. Addressing delayed case reporting in infectious disease forecast modeling. PLoS Comput Biol 2022;18:e1010115. [PMID: 35658007 PMCID: PMC9200328 DOI: 10.1371/journal.pcbi.1010115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 06/15/2022] [Accepted: 04/18/2022] [Indexed: 11/18/2022] Open

Abstract

Infectious disease forecasting is of great interest to the public health community and policymakers, since forecasts can provide insight into disease dynamics in the near future and inform interventions. Due to delays in case reporting, however, forecasting models may often underestimate the current and future disease burden.

In this paper, we propose a general framework for addressing reporting delay in disease forecasting efforts with the goal of improving forecasts. We propose strategies for leveraging either historical data on case reporting or external internet-based data to estimate the amount of reporting error. We then describe several approaches for adapting general forecasting pipelines to account for under- or over-reporting of cases. We apply these methods to address reporting delay in data on dengue fever cases in Puerto Rico from 1990 to 2009 and to reports of influenza-like illness (ILI) in the United States between 2010 and 2019. Through a simulation study, we compare method performance and evaluate robustness to assumption violations. Our results show that forecasting accuracy and prediction coverage almost always increase when correction methods are implemented to address reporting delay. Some of these methods required knowledge about the reporting error or high quality external data, which may not always be available. Provided alternatives include excluding recently-reported data and performing sensitivity analysis. This work provides intuition and guidance for handling delay in disease case reporting and may serve as a useful resource to inform practical infectious disease forecasting efforts.

The public health community and policymakers are interested in using models to predict future disease rates using information about disease rates in the past. However, our data about the recent past are less reliable than older data, due to a time lag between someone getting sick and their subsequent diagnosis being officially reported. In this paper, we describe strategies to correct reported disease rates from the recent past to account for disease diagnoses that haven’t yet been reported. Using more accurate information about the recent past, we can do a better job predicting what will happen in the future.

Collapse

Osthus D. Fast and accurate influenza forecasting in the United States with Inferno. PLoS Comput Biol 2022;18:e1008651. [PMID: 35100253 PMCID: PMC8830797 DOI: 10.1371/journal.pcbi.1008651] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 02/10/2022] [Accepted: 01/02/2022] [Indexed: 01/15/2023] Open

Abstract

Infectious disease forecasting is an emerging field and has the potential to improve public health through anticipatory resource allocation, situational awareness, and mitigation planning. By way of exploring and operationalizing disease forecasting, the U.S. Centers for Disease Control and Prevention (CDC) has hosted FluSight since the 2013/14 flu season, an annual flu forecasting challenge. Since FluSight’s onset, forecasters have developed and improved forecasting models in an effort to provide more timely, reliable, and accurate information about the likely progression of the outbreak. While improving the predictive performance of these forecasting models is often the primary objective, it is also important for a forecasting model to run quickly, facilitating further model development and improvement while providing flexibility when deployed in a real-time setting. In this vein I introduce Inferno, a fast and accurate flu forecasting model inspired by Dante, the top performing model in the 2018/19 FluSight challenge. When pseudoprospectively compared to all models that participated in FluSight 2018/19, Inferno would have placed 2nd in the national and regional challenge as well as the state challenge, behind only Dante. Inferno, however, runs in minutes and is trivially parallelizable, while Dante takes hours to run, representing a significant operational improvement with minimal impact to performance. Forecasting challenges like FluSight should continue to monitor and evaluate how they can be modified and expanded to incentivize the development of forecasting models that benefit public health.

Infectious disease forecasting, if accurate, timely, and reliable, can assist decision makers with resource allocation planning in an attempt to curb the negative impacts of an outbreak. Forecasting challenges, like the U.S. Centers for Disease Control and Prevention’s flu forecasting challenge, FluSight, provide a space for teams to develop and operationalize real-time forecasting models that benefit public health, with weekly forecasts made at the state-level, Health and Human Services region-level, and the United States. The ultimate goal of these models is to produce accurate forecasts within the constraints of the forecasting challenge. Having a forecasting model that runs quickly is also important for future scalability, model development, and operational flexibility. In this paper, I present a fast and accurate flu forecasting model, Inferno. Through retrospective comparisons with FluSight-participating models, Inferno was shown to be a leading forecasting model in the field. Inferno, however, runs in minutes not hours, as other leading forecasting models do. This reduction in runtime constitutes an advancement in flu forecasting, positioning Inferno to scale to more granular geographic units, like counties or health care providers.

Collapse

McAndrew T, Reich NG. Adaptively stacking ensembles for influenza forecasting. Stat Med 2021;40:6931-6952. [PMID: 34647627 PMCID: PMC8671371 DOI: 10.1002/sim.9219] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 09/13/2021] [Accepted: 09/14/2021] [Indexed: 01/01/2023]

Osthus D, Moran KR. Multiscale influenza forecasting. Nat Commun 2021;12:2991. [PMID: 34016992 PMCID: PMC8137955 DOI: 10.1038/s41467-021-23234-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 04/16/2021] [Indexed: 11/09/2022] Open

Gibson GC, Moran KR, Reich NG, Osthus D. Improving probabilistic infectious disease forecasting through coherence. PLoS Comput Biol 2021;17:e1007623. [PMID: 33406068 PMCID: PMC7837472 DOI: 10.1371/journal.pcbi.1007623] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 01/26/2021] [Accepted: 09/14/2020] [Indexed: 11/19/2022] Open

Abstract

With an estimated $10.4 billion in medical costs and 31.4 million outpatient visits each year, influenza poses a serious burden of disease in the United States. To provide insights and advance warning into the spread of influenza, the U.S. Centers for Disease Control and Prevention (CDC) runs a challenge for forecasting weighted influenza-like illness (wILI) at the national and regional level. Many models produce independent forecasts for each geographical unit, ignoring the constraint that the national wILI is a weighted sum of regional wILI, where the weights correspond to the population size of the region. We propose a novel algorithm that transforms a set of independent forecast distributions to obey this constraint, which we refer to as probabilistically coherent. Enforcing probabilistic coherence led to an increase in forecast skill for 79% of the models we tested over multiple flu seasons, highlighting the importance of respecting the forecasting system’s geographical hierarchy.

Seasonal influenza causes a significant public health burden nationwide. Accurate influenza forecasting may help public health officials allocate resources and plan responses to emerging outbreaks. The U.S. Centers for Disease Control and Prevention (CDC) reports influenza data at multiple geographical units, including regionally and nationally, where the national data are by construction a weighted sum of the regional data. In an effort to improve influenza forecast accuracy across all models submitted to the CDC’s annual flu forecasting challenge, we examined the effect of imposing this geographical constraint on the set of independent forecasts, made publicly available by the CDC. We developed a novel method to transform forecast densities to obey the geographical constraint that respects the correlation structure between geographical units. This method showed consistent improvement across 79% of models and that held when stratified by targets and test seasons. Our method can be applied to other forecasting systems both within and outside an infectious disease context that have a geographical hierarchy.

Collapse

Daughton AR, Chunara R, Paul MJ. Comparison of Social Media, Syndromic Surveillance, and Microbiologic Acute Respiratory Infection Data: Observational Study. JMIR Public Health Surveill 2020;6:e14986. [PMID: 32329741 PMCID: PMC7210500 DOI: 10.2196/14986] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 09/27/2019] [Accepted: 02/09/2020] [Indexed: 11/30/2022] Open

Abstract

Background

Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study.

Objective

This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset.

Methods

This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets.

Results

Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P=.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P≤.001). Individual symptoms were also generally not predictable either. The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P=.19).

Conclusions

To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data.

Collapse

Romero-Alvarez D, Parikh N, Osthus D, Martinez K, Generous N, Del Valle S, Manore CA. Google Health Trends performance reflecting dengue incidence for the Brazilian states. BMC Infect Dis 2020;20:252. [PMID: 32228508 PMCID: PMC7104526 DOI: 10.1186/s12879-020-04957-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 03/10/2020] [Indexed: 12/14/2022] Open

Abstract

BACKGROUND

Dengue fever is a mosquito-borne infection transmitted by Aedes aegypti and mainly found in tropical and subtropical regions worldwide. Since its re-introduction in 1986, Brazil has become a hotspot for dengue and has experienced yearly epidemics. As a notifiable infectious disease, Brazil uses a passive epidemiological surveillance system to collect and report cases; however, dengue burden is underestimated. Thus, Internet data streams may complement surveillance activities by providing real-time information in the face of reporting lags.

METHODS

We analyzed 19 terms related to dengue using Google Health Trends (GHT), a free-Internet data-source, and compared it with weekly dengue incidence between 2011 to 2016. We correlated GHT data with dengue incidence at the national and state-level for Brazil while using the adjusted R squared statistic as primary outcome measure (0/1). We used survey data on Internet access and variables from the official census of 2010 to identify where GHT could be useful in tracking dengue dynamics. Finally, we used a standardized volatility index on dengue incidence and developed models with different variables with the same objective.

RESULTS

From the 19 terms explored with GHT, only seven were able to consistently track dengue. From the 27 states, only 12 reported an adjusted R squared higher than 0.8; these states were distributed mainly in the Northeast, Southeast, and South of Brazil. The usefulness of GHT was explained by the logarithm of the number of Internet users in the last 3 months, the total population per state, and the standardized volatility index.

CONCLUSIONS

The potential contribution of GHT in complementing traditional established surveillance strategies should be analyzed in the context of geographical resolutions smaller than countries. For Brazil, GHT implementation should be analyzed in a case-by-case basis. State variables including total population, Internet usage in the last 3 months, and the standardized volatility index could serve as indicators determining when GHT could complement dengue state level surveillance in other countries.

Collapse

Al Hossain F, Lover AA, Corey GA, Reich NG, Rahman T. FluSense: A Contactless Syndromic Surveillance Platform for Influenza-Like Illness in Hospital Waiting Areas. PROCEEDINGS OF THE ACM ON INTERACTIVE, MOBILE, WEARABLE AND UBIQUITOUS TECHNOLOGIES 2020;4:1. [PMID: 35846237 PMCID: PMC9286491 DOI: 10.1145/3381014] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]

Lu J, Meyer S. Forecasting Flu Activity in the United States: Benchmarking an Endemic-Epidemic Beta Model. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020;17:E1381. [PMID: 32098038 PMCID: PMC7068443 DOI: 10.3390/ijerph17041381] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 02/07/2020] [Accepted: 02/15/2020] [Indexed: 11/25/2022]

O'Leary DE, Storey VC. A Google–Wikipedia–Twitter Model as a Leading Indicator of the Numbers of Coronavirus Deaths. INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE AND MANAGEMENT 2020;27. [PMCID: PMC7646638 DOI: 10.1002/isaf.1482] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]

Rangarajan P, Mody SK, Marathe M. Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data. PLoS Comput Biol 2019;15:e1007518. [PMID: 31751346 PMCID: PMC6894887 DOI: 10.1371/journal.pcbi.1007518] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 12/05/2019] [Accepted: 10/29/2019] [Indexed: 12/20/2022] Open

Abstract

Dengue and influenza-like illness (ILI) are two of the leading causes of viral infection in the world and it is estimated that more than half the world’s population is at risk for developing these infections. It is therefore important to develop accurate methods for forecasting dengue and ILI incidences. Since data from multiple sources (such as dengue and ILI case counts, electronic health records and frequency of multiple internet search terms from Google Trends) can improve forecasts, standard time series analysis methods are inadequate to estimate all the parameter values from the limited amount of data available if we use multiple sources. In this paper, we use a computationally efficient implementation of the known variable selection method that we call the Autoregressive Likelihood Ratio (ARLR) method. This method combines sparse representation of time series data, electronic health records data (for ILI) and Google Trends data to forecast dengue and ILI incidences. This sparse representation method uses an algorithm that maximizes an appropriate likelihood ratio at every step. Using numerical experiments, we demonstrate that our method recovers the underlying sparse model much more accurately than the lasso method. We apply our method to dengue case count data from five countries/states: Brazil, Mexico, Singapore, Taiwan, and Thailand and to ILI case count data from the United States. Numerical experiments show that our method outperforms existing time series forecasting methods in forecasting the dengue and ILI case counts. In particular, our method gives a 18 percent forecast error reduction over a leading method that also uses data from multiple sources. It also performs better than other methods in predicting the peak value of the case count and the peak time.

Dengue and influenza-like illness (ILI) are leading causes of viral infection in the world and hence it is important to develop accurate methods for forecasting their incidence. We use Autoregressive Likelihood Ratio method, which is a computationally efficient implementation of the variable selection method, in order to obtain a sparse (non-lasso) representation of time series, Google Trends and electronic health records (for ILI) data. This method is used to forecast dengue incidence in five countries/states and ILI incidence in USA. We show that this method outperforms existing time series methods in forecasting these diseases. The method is general and can also be used to forecast other diseases.

Collapse

Estimating influenza incidence using search query deceptiveness and generalized ridge regression. PLoS Comput Biol 2019;15:e1007165. [PMID: 31574086 PMCID: PMC6771994 DOI: 10.1371/journal.pcbi.1007165] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 05/31/2019] [Indexed: 11/22/2022] Open

Abstract

Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically-selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additional data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates.

While often considered a minor infection, seasonal flu kills many thousands of people every year and sickens millions more. The more accurate and up-to-date public health officials’ view of what the seasonal outbreak is, the more effectively the outbreak can be addressed. Currently, this knowledge is based on collating information on patients who enter the health care system. This approach is accurate, but it’s also expensive and slow. Researchers hope that new approaches based on examining what people do and share on the internet may work more cheaply and quickly. Some internet activity, however, has a history of correspondence with disease activity, but this relationship is coincidental rather than informative. For example, some prior work has found a correspondence between zombie-related social media messages and the flu season, so one could plausibly build accurate flu estimates using such messages that are then fooled by the appearance of a new zombie movie. We tested flu estimation models that incorporate information about this risk of deception, finding that knowledge of deceptiveness does indeed produce more accurate estimates; we also identified a method to estimate deceptiveness. Our results suggest that estimation models used in practice should use information about both how inputs maps to disease activity and also what the potential of each input to be deceptive is. This may get us one step closer to accurate, reliable disease estimates based on internet data, which would improve public health by making those estimates faster and cheaper.

Collapse

On the multibin logarithmic score used in the FluSight competitions. Proc Natl Acad Sci U S A 2019;116:20809-20810. [PMID: 31558612 DOI: 10.1073/pnas.1912147116] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Kandula S, Shaman J. Reappraising the utility of Google Flu Trends. PLoS Comput Biol 2019;15:e1007258. [PMID: 31374088 PMCID: PMC6693776 DOI: 10.1371/journal.pcbi.1007258] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 08/14/2019] [Accepted: 07/09/2019] [Indexed: 11/18/2022] Open

Abstract

Estimation of influenza-like illness (ILI) using search trends activity was intended to supplement traditional surveillance systems, and was a motivation behind the development of Google Flu Trends (GFT). However, several studies have previously reported large errors in GFT estimates of ILI in the US. Following recent release of time-stamped surveillance data, which better reflects real-time operational scenarios, we reanalyzed GFT errors. Using three data sources-GFT: an archive of weekly ILI estimates from Google Flu Trends; ILIf: fully-observed ILI rates from ILINet; and, ILIp: ILI rates available in real-time based on partial reporting-five influenza seasons were analyzed and mean square errors (MSE) of GFT and ILIp as estimates of ILIf were computed. To correct GFT errors, a random forest regression model was built with ILI and GFT rates from the previous three weeks as predictors. An overall reduction in error of 44% was observed and the errors of the corrected GFT are lower than those of ILIp. An 80% reduction in error during 2012/13, when GFT had large errors, shows that extreme failures of GFT could have been avoided. Using autoregressive integrated moving average (ARIMA) models, one- to four-week ahead forecasts were generated with two separate data streams: ILIp alone, and with both ILIp and corrected GFT. At all forecast targets and seasons, and for all but two regions, inclusion of GFT lowered MSE. Results from two alternative error measures, mean absolute error and mean absolute proportional error, were largely consistent with results from MSE. Taken together these findings provide an error profile of GFT in the US, establish strong evidence for the adoption of search trends based 'nowcasts' in influenza forecast systems, and encourage reevaluation of the utility of this data source in diverse domains.

Collapse