1
|
Song TH, Clemente L, Pan X, Jang J, Santillana M, Lee K. Fine-grained forecasting of COVID-19 trends at the county level in the United States. NPJ Digit Med 2025; 8:204. [PMID: 40216974 PMCID: PMC11992165 DOI: 10.1038/s41746-025-01606-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 03/30/2025] [Indexed: 04/14/2025] Open
Abstract
The novel coronavirus (COVID-19) pandemic has had a devastating global impact, profoundly affecting daily life, healthcare systems, and public health infrastructure. Despite the availability of treatments and vaccines, hospitalizations and deaths continue. Real-time surveillance of infection trends supports resource allocation and mitigation strategies, but reliable forecasting remains a challenge. While deep learning has advanced time-series forecasting, its effectiveness relies on large datasets, a significant obstacle given the pandemic's evolving nature. Most models use national or state-level data, limiting both dataset size and the granularity of insights. To address this, we propose the Fine-Grained Infection Forecast Network (FIGI-Net), a stacked bidirectional LSTM structure designed to leverage county-level data to produce daily forecasts up to two weeks in advance. FIGI-Net outperforms existing models, accurately predicting sudden changes such as new outbreaks or peaks, a capability many state-of-the-art models lack. This approach could enhance public health responses and outbreak preparedness.
Collapse
Affiliation(s)
- Tzu-Hsi Song
- Vascular Biology Program and Department of Surgery, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Leonardo Clemente
- Department of Physics and Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, 02115, USA
| | - Xiang Pan
- Vascular Biology Program and Department of Surgery, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Junbong Jang
- Vascular Biology Program and Department of Surgery, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Mauricio Santillana
- Department of Physics and Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, 02115, USA.
| | - Kwonmoo Lee
- Vascular Biology Program and Department of Surgery, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
2
|
Meyer AG, Lu F, Clemente L, Santillana M. A prospective real-time transfer learning approach to estimate influenza hospitalizations with limited data. Epidemics 2025; 50:100816. [PMID: 39985955 DOI: 10.1016/j.epidem.2025.100816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 11/26/2024] [Accepted: 01/22/2025] [Indexed: 02/24/2025] Open
Abstract
Accurate, real-time forecasts of influenza hospitalizations would facilitate prospective resource allocation and public health preparedness. State-of-the-art machine learning methods are a promising approach to produce such forecasts, but they require extensive historical data to be properly trained. Unfortunately, data on influenza hospitalizations, for the 50 states in the United States, are only available since the beginning of 2020. In addition, the data are far from perfect as they were under-reported for several months before health systems began consistently submitting their data. To address these issues, we propose a transfer learning approach. We extend the currently available two-season dataset for state-level influenza hospitalizations by an additional ten seasons. Our method leverages influenza-like illness (ILI) data to infer historical estimates of influenza hospitalizations. This data augmentation enables the implementation of advanced machine learning techniques, multi-horizon training, and an ensemble of models to improve hospitalization forecasts. We evaluated the performance of our machine learning approaches by prospectively producing forecasts for future weeks and submitting them in real time to the Centers for Disease Control and Prevention FluSight challenges during two seasons: 2022-2023 and 2023-2024. Our methodology demonstrated good accuracy and reliability, achieving a fourth place finish (among 20 participating teams) in the 2022-23 and a second place finish (among 20 participating teams) in the 2023-24 CDC FluSight challenges. Our findings highlight the utility of data augmentation and knowledge transfer in the application of machine learning models to public health surveillance where only limited historical data is available.
Collapse
Affiliation(s)
- Austin G Meyer
- Machine Intelligence Group for the betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA; Department of Physics, Northeastern University, Boston, MA, USA; Department of Pediatrics, Baylor Scott and White Health, Temple, TX, USA.
| | - Fred Lu
- Machine Intelligence Group for the betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA; Department of Physics, Northeastern University, Boston, MA, USA
| | - Leonardo Clemente
- Machine Intelligence Group for the betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA; Department of Physics, Northeastern University, Boston, MA, USA
| | - Mauricio Santillana
- Machine Intelligence Group for the betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA; Department of Physics, Northeastern University, Boston, MA, USA; Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
3
|
Andronico A, Paireau J, Cauchemez S. Integrating information from historical data into mechanistic models for influenza forecasting. PLoS Comput Biol 2024; 20:e1012523. [PMID: 39475955 PMCID: PMC11524484 DOI: 10.1371/journal.pcbi.1012523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 09/27/2024] [Indexed: 11/02/2024] Open
Abstract
Seasonal influenza causes significant annual morbidity and mortality worldwide. In France, it is estimated that, on average, 2 million individuals consult their GP for influenza-like-illness (ILI) every year. Traditionally, mathematical models used for epidemic forecasting can either include parameters capturing the infection process (mechanistic or compartmental models) or rely on time series analysis approaches that do not make mechanistic assumptions (statistical or phenomenological models). While the latter make extensive use of past epidemic data, mechanistic models are usually independently initialized in each season. As a result, forecasts from such models can contain trajectories that are vastly different from past epidemics. We developed a mechanistic model that takes into account epidemic data from training seasons when producing forecasts. The parameters of the model are estimated via a first particle filter running on the observed data. A second particle filter is then used to produce forecasts compatible with epidemic trajectories from the training set. The model was calibrated and tested on 35 years' worth of surveillance data from the French Sentinelles Network, representing the weekly number of patients consulting for ILI over the period 1985-2019. Our results show that the new method improves upon standard mechanistic approaches. In particular, when retrospectively tested on the available data, our model provides increased accuracy for short-term forecasts (from one to four weeks into the future) and peak timing and intensity. Our new approach for epidemic forecasting allows the integration of key strengths of the statistical approach into the mechanistic modelling framework and represents an attempt to provide accurate forecasts by making full use of the rich surveillance dataset collected in France since 1985.
Collapse
Affiliation(s)
- Alessio Andronico
- Mathematical Modelling of Infectious Diseases Unit, Institut Pasteur, Université Paris Cité, UMR2000 CNRS, Paris, France
| | - Juliette Paireau
- Mathematical Modelling of Infectious Diseases Unit, Institut Pasteur, Université Paris Cité, UMR2000 CNRS, Paris, France
- Infectious Diseases Department, Santé publique France, Saint-Maurice, France
| | - Simon Cauchemez
- Mathematical Modelling of Infectious Diseases Unit, Institut Pasteur, Université Paris Cité, UMR2000 CNRS, Paris, France
| |
Collapse
|
4
|
Lopez VK, Cramer EY, Pagano R, Drake JM, O’Dea EB, Adee M, Ayer T, Chhatwal J, Dalgic OO, Ladd MA, Linas BP, Mueller PP, Xiao J, Bracher J, Castro Rivadeneira AJ, Gerding A, Gneiting T, Huang Y, Jayawardena D, Kanji AH, Le K, Mühlemann A, Niemi J, Ray EL, Stark A, Wang Y, Wattanachit N, Zorn MW, Pei S, Shaman J, Yamana TK, Tarasewicz SR, Wilson DJ, Baccam S, Gurung H, Stage S, Suchoski B, Gao L, Gu Z, Kim M, Li X, Wang G, Wang L, Wang Y, Yu S, Gardner L, Jindal S, Marshall M, Nixon K, Dent J, Hill AL, Kaminsky J, Lee EC, Lemaitre JC, Lessler J, Smith CP, Truelove S, Kinsey M, Mullany LC, Rainwater-Lovett K, Shin L, Tallaksen K, Wilson S, Karlen D, Castro L, Fairchild G, Michaud I, Osthus D, Bian J, Cao W, Gao Z, Lavista Ferres J, Li C, Liu TY, Xie X, Zhang S, Zheng S, Chinazzi M, Davis JT, Mu K, Pastore y Piontti A, Vespignani A, Xiong X, Walraven R, Chen J, Gu Q, Wang L, Xu P, Zhang W, Zou D, Gibson GC, Sheldon D, Srivastava A, Adiga A, Hurt B, Kaur G, Lewis B, Marathe M, Peddireddy AS, Porebski P, et alLopez VK, Cramer EY, Pagano R, Drake JM, O’Dea EB, Adee M, Ayer T, Chhatwal J, Dalgic OO, Ladd MA, Linas BP, Mueller PP, Xiao J, Bracher J, Castro Rivadeneira AJ, Gerding A, Gneiting T, Huang Y, Jayawardena D, Kanji AH, Le K, Mühlemann A, Niemi J, Ray EL, Stark A, Wang Y, Wattanachit N, Zorn MW, Pei S, Shaman J, Yamana TK, Tarasewicz SR, Wilson DJ, Baccam S, Gurung H, Stage S, Suchoski B, Gao L, Gu Z, Kim M, Li X, Wang G, Wang L, Wang Y, Yu S, Gardner L, Jindal S, Marshall M, Nixon K, Dent J, Hill AL, Kaminsky J, Lee EC, Lemaitre JC, Lessler J, Smith CP, Truelove S, Kinsey M, Mullany LC, Rainwater-Lovett K, Shin L, Tallaksen K, Wilson S, Karlen D, Castro L, Fairchild G, Michaud I, Osthus D, Bian J, Cao W, Gao Z, Lavista Ferres J, Li C, Liu TY, Xie X, Zhang S, Zheng S, Chinazzi M, Davis JT, Mu K, Pastore y Piontti A, Vespignani A, Xiong X, Walraven R, Chen J, Gu Q, Wang L, Xu P, Zhang W, Zou D, Gibson GC, Sheldon D, Srivastava A, Adiga A, Hurt B, Kaur G, Lewis B, Marathe M, Peddireddy AS, Porebski P, Venkatramanan S, Wang L, Prasad PV, Walker JW, Webber AE, Slayton RB, Biggerstaff M, Reich NG, Johansson MA. Challenges of COVID-19 Case Forecasting in the US, 2020-2021. PLoS Comput Biol 2024; 20:e1011200. [PMID: 38709852 PMCID: PMC11098513 DOI: 10.1371/journal.pcbi.1011200] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 05/16/2024] [Accepted: 04/01/2024] [Indexed: 05/08/2024] Open
Abstract
During the COVID-19 pandemic, forecasting COVID-19 trends to support planning and response was a priority for scientists and decision makers alike. In the United States, COVID-19 forecasting was coordinated by a large group of universities, companies, and government entities led by the Centers for Disease Control and Prevention and the US COVID-19 Forecast Hub (https://covid19forecasthub.org). We evaluated approximately 9.7 million forecasts of weekly state-level COVID-19 cases for predictions 1-4 weeks into the future submitted by 24 teams from August 2020 to December 2021. We assessed coverage of central prediction intervals and weighted interval scores (WIS), adjusting for missing forecasts relative to a baseline forecast, and used a Gaussian generalized estimating equation (GEE) model to evaluate differences in skill across epidemic phases that were defined by the effective reproduction number. Overall, we found high variation in skill across individual models, with ensemble-based forecasts outperforming other approaches. Forecast skill relative to the baseline was generally higher for larger jurisdictions (e.g., states compared to counties). Over time, forecasts generally performed worst in periods of rapid changes in reported cases (either in increasing or decreasing epidemic phases) with 95% prediction interval coverage dropping below 50% during the growth phases of the winter 2020, Delta, and Omicron waves. Ideally, case forecasts could serve as a leading indicator of changes in transmission dynamics. However, while most COVID-19 case forecasts outperformed a naïve baseline model, even the most accurate case forecasts were unreliable in key phases. Further research could improve forecasts of leading indicators, like COVID-19 cases, by leveraging additional real-time data, addressing performance across phases, improving the characterization of forecast confidence, and ensuring that forecasts were coherent across spatial scales. In the meantime, it is critical for forecast users to appreciate current limitations and use a broad set of indicators to inform pandemic-related decision making.
Collapse
Affiliation(s)
- Velma K. Lopez
- COVID-19 Response, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Estee Y. Cramer
- University of Massachusetts, Amherst, Amherst, Massachusetts, United States of America
| | - Robert Pagano
- Unaffiliated, Tucson, Arizona, United States of America
| | - John M. Drake
- University of Georgia, Athens, Georgia, United States of America
| | - Eamon B. O’Dea
- University of Georgia, Athens, Georgia, United States of America
| | - Madeline Adee
- Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Turgay Ayer
- Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Jagpreet Chhatwal
- Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Ozden O. Dalgic
- Value Analytics Labs, Boston, Massachusetts, United States of America
| | - Mary A. Ladd
- Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Benjamin P. Linas
- Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Peter P. Mueller
- Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Jade Xiao
- Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Johannes Bracher
- Chair of Econometrics and Statistics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | | | - Aaron Gerding
- University of Massachusetts, Amherst, Amherst, Massachusetts, United States of America
| | - Tilmann Gneiting
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Yuxin Huang
- University of Massachusetts, Amherst, Amherst, Massachusetts, United States of America
| | - Dasuni Jayawardena
- University of Massachusetts, Amherst, Amherst, Massachusetts, United States of America
| | - Abdul H. Kanji
- University of Massachusetts, Amherst, Amherst, Massachusetts, United States of America
| | - Khoa Le
- University of Massachusetts, Amherst, Amherst, Massachusetts, United States of America
| | - Anja Mühlemann
- Institute of Mathematical Statistics and Actuarial Science, University of Bern, Bern, Switzerland
| | - Jarad Niemi
- Iowa State University, Ames, Iowa, United States of America
| | - Evan L. Ray
- University of Massachusetts, Amherst, Amherst, Massachusetts, United States of America
| | - Ariane Stark
- University of Massachusetts, Amherst, Amherst, Massachusetts, United States of America
| | - Yijin Wang
- University of Massachusetts, Amherst, Amherst, Massachusetts, United States of America
| | - Nutcha Wattanachit
- University of Massachusetts, Amherst, Amherst, Massachusetts, United States of America
| | - Martha W. Zorn
- University of Massachusetts, Amherst, Amherst, Massachusetts, United States of America
| | - Sen Pei
- Mailman School of Public Health, Columbia University, New York, New York, United States of America
| | - Jeffrey Shaman
- Mailman School of Public Health, Columbia University, New York, New York, United States of America
| | - Teresa K. Yamana
- Mailman School of Public Health, Columbia University, New York, New York, United States of America
| | - Samuel R. Tarasewicz
- Federal Reserve Bank of San Francisco, San Francisco, California, United States of America
| | - Daniel J. Wilson
- Federal Reserve Bank of San Francisco, San Francisco, California, United States of America
| | - Sid Baccam
- IEM, Bel Air, Maryland, United States of America
| | - Heidi Gurung
- IEM, Bel Air, Maryland, United States of America
| | - Steve Stage
- IEM, Baton Rouge, Louisiana, United States of America
| | | | - Lei Gao
- George Mason University, Fairfax, Virginia, United States of America
| | - Zhiling Gu
- Iowa State University, Ames, Iowa, United States of America
| | - Myungjin Kim
- Kyungpook National University, Bukgu, Daegu, Republic of Korea
| | - Xinyi Li
- Clemson University, Clemson, South Carolina, United States of America
| | - Guannan Wang
- College of William & Mary, Williamsburg, Virginia, United States of America
| | - Lily Wang
- George Mason University, Fairfax, Virginia, United States of America
| | - Yueying Wang
- Amazon, Seattle, Washington, United States of America
| | - Shan Yu
- University of Virginia, Charlottesville, Virginia, United States of America
| | - Lauren Gardner
- Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Sonia Jindal
- Johns Hopkins University, Baltimore, Maryland, United States of America
| | | | - Kristen Nixon
- Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Juan Dent
- Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Alison L. Hill
- Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Joshua Kaminsky
- Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Elizabeth C. Lee
- Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | | | - Justin Lessler
- Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Claire P. Smith
- Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Shaun Truelove
- Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Matt Kinsey
- Johns Hopkins University Applied Physics Lab, Baltimore, Maryland, United States of America
| | - Luke C. Mullany
- Johns Hopkins University Applied Physics Lab, Baltimore, Maryland, United States of America
| | | | - Lauren Shin
- Johns Hopkins University Applied Physics Lab, Baltimore, Maryland, United States of America
| | - Katharine Tallaksen
- Johns Hopkins University Applied Physics Lab, Baltimore, Maryland, United States of America
| | - Shelby Wilson
- Johns Hopkins University Applied Physics Lab, Baltimore, Maryland, United States of America
| | - Dean Karlen
- University of Victoria and TRIUMF, Victoria, British Columbia, Canada
| | - Lauren Castro
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Geoffrey Fairchild
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Isaac Michaud
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Dave Osthus
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Jiang Bian
- Microsoft, Redmond, Washington, United States of America
| | - Wei Cao
- Microsoft, Redmond, Washington, United States of America
| | - Zhifeng Gao
- Microsoft, Redmond, Washington, United States of America
| | | | - Chaozhuo Li
- Microsoft, Redmond, Washington, United States of America
| | - Tie-Yan Liu
- Microsoft, Redmond, Washington, United States of America
| | - Xing Xie
- Microsoft, Redmond, Washington, United States of America
| | - Shun Zhang
- Microsoft, Redmond, Washington, United States of America
| | - Shun Zheng
- Microsoft, Redmond, Washington, United States of America
| | - Matteo Chinazzi
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Jessica T. Davis
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Kunpeng Mu
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Ana Pastore y Piontti
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Alessandro Vespignani
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Xinyue Xiong
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | | | - Jinghui Chen
- University of California, Los Angeles, Los Angeles, California, United States of America
| | - Quanquan Gu
- University of California, Los Angeles, Los Angeles, California, United States of America
| | - Lingxiao Wang
- University of California, Los Angeles, Los Angeles, California, United States of America
| | - Pan Xu
- University of California, Los Angeles, Los Angeles, California, United States of America
| | - Weitong Zhang
- University of California, Los Angeles, Los Angeles, California, United States of America
| | - Difan Zou
- University of California, Los Angeles, Los Angeles, California, United States of America
| | - Graham Casey Gibson
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Daniel Sheldon
- University of Massachusetts, Amherst, Amherst, Massachusetts, United States of America
| | - Ajitesh Srivastava
- University of Southern California, Los Angeles, California, United States of America
| | - Aniruddha Adiga
- University of Virginia, Charlottesville, Virginia, United States of America
| | - Benjamin Hurt
- University of Virginia, Charlottesville, Virginia, United States of America
| | - Gursharn Kaur
- University of Virginia, Charlottesville, Virginia, United States of America
| | - Bryan Lewis
- University of Virginia, Charlottesville, Virginia, United States of America
| | - Madhav Marathe
- University of Virginia, Charlottesville, Virginia, United States of America
| | | | | | | | - Lijing Wang
- New Jersey Institute of Technology, Newark, New Jersey, United States of America
| | - Pragati V. Prasad
- COVID-19 Response, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Jo W. Walker
- COVID-19 Response, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Alexander E. Webber
- COVID-19 Response, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Rachel B. Slayton
- COVID-19 Response, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Matthew Biggerstaff
- COVID-19 Response, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Nicholas G. Reich
- University of Massachusetts, Amherst, Amherst, Massachusetts, United States of America
| | - Michael A. Johansson
- COVID-19 Response, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| |
Collapse
|
5
|
Song TH, Clemente L, Pan X, Jang J, Santillana M, Lee K. Fine-Grained Forecasting of COVID-19 Trends at the County Level in the United States. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.13.24301248. [PMID: 38293076 PMCID: PMC10827234 DOI: 10.1101/2024.01.13.24301248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
The novel coronavirus (COVID-19) pandemic, first identified in Wuhan China in December 2019, has profoundly impacted various aspects of daily life, society, healthcare systems, and global health policies. There have been more than half a billion human infections and more than 6 million deaths globally attributable to COVID-19. Although treatments and vaccines to protect against COVID-19 are now available, people continue being hospitalized and dying due to COVID-19 infections. Real-time surveillance of population-level infections, hospitalizations, and deaths has helped public health officials better allocate healthcare resources and deploy mitigation strategies. However, producing reliable, real-time, short-term disease activity forecasts (one or two weeks into the future) remains a practical challenge. The recent emergence of robust time-series forecasting methodologies based on deep learning approaches has led to clear improvements in multiple research fields. We propose a recurrent neural network model named Fine-Grained Infection Forecast Network (FIGI-Net), which utilizes a stacked bidirectional LSTM structure designed to leverage fine-grained county-level data, to produce daily forecasts of COVID-19 infection trends up to two weeks in advance. We show that FIGI-Net improves existing COVID-19 forecasting approaches and delivers accurate county-level COVID-19 disease estimates. Specifically, FIGI-Net is capable of anticipating upcoming sudden changes in disease trends such as the onset of a new outbreak or the peak of an ongoing outbreak, a skill that multiple existing state-of-the-art models fail to achieve. This improved performance is observed across locations and periods. Our enhanced forecasting methodologies may help protect human populations against future disease outbreaks.
Collapse
Affiliation(s)
- Tzu-Hsi Song
- Vascular Biology Program and Department of Surgery, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Leonardo Clemente
- Department of Physics and Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA
| | - Xiang Pan
- Vascular Biology Program and Department of Surgery, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Junbong Jang
- Vascular Biology Program and Department of Surgery, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Mauricio Santillana
- Department of Physics and Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA
| | - Kwonmoo Lee
- Vascular Biology Program and Department of Surgery, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
6
|
Hongliang G, Zhiyao Z, Ahmadianfar I, Escorcia-Gutierrez J, Aljehane NO, Li C. Multi-step influenza forecasting through singular value decomposition and kernel ridge regression with MARCOS-guided gradient-based optimization. Comput Biol Med 2024; 169:107888. [PMID: 38157778 DOI: 10.1016/j.compbiomed.2023.107888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/28/2023] [Accepted: 12/18/2023] [Indexed: 01/03/2024]
Abstract
This research delves into the significance of influenza outbreaks in public health, particularly the importance of accurate forecasts using weekly Influenza-like illness (ILI) rates. The present work develops a novel hybrid machine-learning model by combining singular value decomposition with kernel ridge regression (SKRR). In this context, a novel hybrid model known as H-SKRR is developed by combining two robust forecasting approaches, SKRR and ridge regression, which aims to improve multi-step-ahead predictions for weekly ILI rates in Southern and Northern China. The study begins with feature selection via XGBoost in the preprocessing phase, identifying optimal precursor information guided by importance factors. It decomposes the original signal using multivariate variational mode decomposition (MVMD) to address non-stationarity and complexity. H-SKRR is implemented by incorporating significant lagged-time components across sub-components. The aggregated forecasted values from these sub-components generate ILI values for two horizons (i.e., 4-and 7-weekly ahead). Employing the gradient-based optimization (GBO) algorithm fine-tunes model parameters. Furthermore, the deep random vector functional link (dRVFL), Ridge regression, and gated recurrent unit neural network (GRU) models were employed to validate the MVMD-H-SKRR-GBO paradigm's effectiveness. The outcomes, assessed using the MARCOS (Measurement of alternatives and ranking according to compromise solution) method as a multi-criteria decision-making method, highlight the superior accuracy of the MVMD-H-SKRR-GBO model in predicting ILI rates. The results clearly highlight the exceptional performance of the MVMD-H-SKRR-GBO model, with outstanding precision demonstrated by impressive R, RMSE, IA, and U95 % values of 0.946, 0.388, 0.970, and 1.075, respectively, at t + 7.
Collapse
Affiliation(s)
- Guo Hongliang
- College of Information Technology, Jilin Agricultural University, Changchun, 130118, China.
| | - Zhang Zhiyao
- College of Information Technology, Jilin Agricultural University, Changchun, 130118, China.
| | - Iman Ahmadianfar
- Information and Communication Technology Research Group, Scientific Research Center, Al-Ayen University, Thi-Qar, Nasiriyah, 64001, Iraq.
| | - José Escorcia-Gutierrez
- Department of Computational Science and Electronics, Universidad de La Costa, CUC, Barranquilla, 080002, Colombia.
| | - Nojood O Aljehane
- Faculty of Computers and Information Technology, University of Tabuk, Tabuk, Saudi Arabia, Tabuk University, KSA.
| | - Chengye Li
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| |
Collapse
|
7
|
Morris M, Hayes P, Cox IJ, Lampos V. Neural network models for influenza forecasting with associated uncertainty using Web search activity trends. PLoS Comput Biol 2023; 19:e1011392. [PMID: 37639427 PMCID: PMC10491400 DOI: 10.1371/journal.pcbi.1011392] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 09/08/2023] [Accepted: 07/26/2023] [Indexed: 08/31/2023] Open
Abstract
Influenza affects millions of people every year. It causes a considerable amount of medical visits and hospitalisations as well as hundreds of thousands of deaths. Forecasting influenza prevalence with good accuracy can significantly help public health agencies to timely react to seasonal or novel strain epidemics. Although significant progress has been made, influenza forecasting remains a challenging modelling task. In this paper, we propose a methodological framework that improves over the state-of-the-art forecasting accuracy of influenza-like illness (ILI) rates in the United States. We achieve this by using Web search activity time series in conjunction with historical ILI rates as observations for training neural network (NN) architectures. The proposed models incorporate Bayesian layers to produce associated uncertainty intervals to their forecast estimates, positioning themselves as legitimate complementary solutions to more conventional approaches. The best performing NN, referred to as the iterative recurrent neural network (IRNN) architecture, reduces mean absolute error by 10.3% and improves skill by 17.1% on average in nowcasting and forecasting tasks across 4 consecutive flu seasons.
Collapse
Affiliation(s)
- Michael Morris
- University College London, Centre for Artificial Intelligence, Department of Computer Science, London, United Kingdom
| | - Peter Hayes
- University College London, Centre for Artificial Intelligence, Department of Computer Science, London, United Kingdom
| | - Ingemar J. Cox
- University College London, Centre for Artificial Intelligence, Department of Computer Science, London, United Kingdom
- University of Copenhagen, Department of Computer Science, Copenhagen, Denmark
| | - Vasileios Lampos
- University College London, Centre for Artificial Intelligence, Department of Computer Science, London, United Kingdom
| |
Collapse
|
8
|
Ray EL, Brooks LC, Bien J, Biggerstaff M, Bosse NI, Bracher J, Cramer EY, Funk S, Gerding A, Johansson MA, Rumack A, Wang Y, Zorn M, Tibshirani RJ, Reich NG. Comparing trained and untrained probabilistic ensemble forecasts of COVID-19 cases and deaths in the United States. INTERNATIONAL JOURNAL OF FORECASTING 2023; 39:1366-1383. [PMID: 35791416 PMCID: PMC9247236 DOI: 10.1016/j.ijforecast.2022.06.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The U.S. COVID-19 Forecast Hub aggregates forecasts of the short-term burden of COVID-19 in the United States from many contributing teams. We study methods for building an ensemble that combines forecasts from these teams. These experiments have informed the ensemble methods used by the Hub. To be most useful to policymakers, ensemble forecasts must have stable performance in the presence of two key characteristics of the component forecasts: (1) occasional misalignment with the reported data, and (2) instability in the relative performance of component forecasters over time. Our results indicate that in the presence of these challenges, an untrained and robust approach to ensembling using an equally weighted median of all component forecasts is a good choice to support public health decision-makers. In settings where some contributing forecasters have a stable record of good performance, trained ensembles that give those forecasters higher weight can also be helpful.
Collapse
Affiliation(s)
- Evan L Ray
- School of Public Health and Health Sciences, University of Massachusetts Amherst, United States of America
| | - Logan C Brooks
- Machine Learning Department, Carnegie Mellon University, United States of America
| | - Jacob Bien
- Department of Data Sciences and Operations, University of Southern California, United States of America
| | - Matthew Biggerstaff
- COVID-19 Response, U.S. Centers for Disease Control and Prevention, United States of America
| | - Nikos I Bosse
- London School of Hygiene & Tropical Medicine, United Kingdom
| | - Johannes Bracher
- Chair of Statistical Methods and Econometrics, Karlsruhe Institute of Technology, Germany
- Computational Statistics Group, Heidelberg Institute for Theoretical Studies, Germany
| | - Estee Y Cramer
- School of Public Health and Health Sciences, University of Massachusetts Amherst, United States of America
| | - Sebastian Funk
- London School of Hygiene & Tropical Medicine, United Kingdom
| | - Aaron Gerding
- School of Public Health and Health Sciences, University of Massachusetts Amherst, United States of America
| | - Michael A Johansson
- COVID-19 Response, U.S. Centers for Disease Control and Prevention, United States of America
| | - Aaron Rumack
- Machine Learning Department, Carnegie Mellon University, United States of America
| | - Yijin Wang
- School of Public Health and Health Sciences, University of Massachusetts Amherst, United States of America
| | - Martha Zorn
- School of Public Health and Health Sciences, University of Massachusetts Amherst, United States of America
| | - Ryan J Tibshirani
- Machine Learning Department, Carnegie Mellon University, United States of America
| | - Nicholas G Reich
- School of Public Health and Health Sciences, University of Massachusetts Amherst, United States of America
| |
Collapse
|
9
|
Ma S, Ning S, Yang S. Joint COVID-19 and influenza-like illness forecasts in the United States using internet search information. COMMUNICATIONS MEDICINE 2023; 3:39. [PMID: 36964311 PMCID: PMC10038385 DOI: 10.1038/s43856-023-00272-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 03/09/2023] [Indexed: 03/26/2023] Open
Abstract
BACKGROUND As the prolonged COVID-19 pandemic continues, severe seasonal Influenza (flu) may happen alongside COVID-19. This could cause a "twindemic", in which there are additional burdens on health care resources and public safety compared to those occurring in the presence of a single infection. Amidst the raising trend of co-infections of the two diseases, forecasting both Influenza-like Illness (ILI) outbreaks and COVID-19 waves in a reliable and timely manner becomes more urgent than ever. Accurate and real-time joint prediction of the twindemic aids public health organizations and policymakers in adequate preparation and decision making. However, in the current pandemic, existing ILI and COVID-19 forecasting models face shortcomings under complex inter-disease dynamics, particularly due to the similarities in symptoms and healthcare-seeking patterns of the two diseases. METHODS Inspired by the interconnection between ILI and COVID-19 activities, we combine related internet search and bi-disease time series information for the U.S. national level and state level forecasts. Our proposed ARGOX-Joint-Ensemble adopts a new ensemble framework that integrates ILI and COVID-19 disease forecasting models to pool the information between the two diseases and provide joint multi-resolution and multi-target predictions. Through a winner-takes-all ensemble fashion, our framework is able to adaptively select the most predictive COVID-19 or ILI signals. RESULTS In the retrospective evaluation, our model steadily outperforms alternative benchmark methods, and remains competitive with other publicly available models in both point estimates and probabilistic predictions (including intervals). CONCLUSIONS The success of our approach illustrates that pooling information between the ILI and COVID-19 leads to improved forecasting models than individual models for either of the disease.
Collapse
Affiliation(s)
- Simin Ma
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Shaoyang Ning
- Department of Mathematics and Statistics, Williams College, Williamstown, MA, 01267, USA
| | - Shihao Yang
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
10
|
Paul R, Han D, DeDoncker E, Prieto D. Dynamic downscaling and daily nowcasting from influenza surveillance data. Stat Med 2022; 41:4159-4175. [PMID: 35718471 PMCID: PMC9544787 DOI: 10.1002/sim.9502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 04/30/2022] [Accepted: 05/31/2022] [Indexed: 11/08/2022]
Abstract
Real-time trends from surveillance data are important to assess and develop preparedness for influenza outbreaks. The overwhelming testing demand and limited capacity of testing laboratories for viral positivity render daily confirmed case data inaccurate and delay its availability in preparedness. Using Bayesian dynamic downscaling models, we obtained posterior estimates for daily influenza incidences from weekly estimates of the Centers for Disease Control and Prevention and daily reported constitutional and respiratory complaints during emergency department (ED) visits obtained from the state health departments. Our model provides one-day and seven-day lead forecasts along with 95 % $$ \% $$ prediction intervals. Our hybrid Markov Chain Monte Carlo and Kalman filter algorithms facilitate faster computation and enable us to update our estimates as new data become available. Our method is tested and validated using the State of Michigan data over the years 2009-2013. Reported constitutional and respiratory complaints at the EDs showed strong correlations of 0.81 and 0.68 respectively, with influenza rates. In general, our forecast model can be adapted to track an outbreak with only one respiratory virus as a causative agent.
Collapse
Affiliation(s)
- Rajib Paul
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina, USA
| | - Dan Han
- Department of Mathematics, University of Louisville, Louisville, Kentucky, USA
| | - Elise DeDoncker
- Department of Computer Science, Western Michigan University, Kalamazoo, Michigan, USA
| | - Diana Prieto
- Carey School of Business, Johns Hopkins University, Baltimore, Maryland, USA.,School of Industrial Engineering, Pontificia Universdad de Catòlica de Valparaìso, Valparaìso, Chile
| |
Collapse
|
11
|
Beesley LJ, Osthus D, Del Valle SY. Addressing delayed case reporting in infectious disease forecast modeling. PLoS Comput Biol 2022; 18:e1010115. [PMID: 35658007 PMCID: PMC9200328 DOI: 10.1371/journal.pcbi.1010115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 06/15/2022] [Accepted: 04/18/2022] [Indexed: 11/18/2022] Open
Abstract
Infectious disease forecasting is of great interest to the public health community and policymakers, since forecasts can provide insight into disease dynamics in the near future and inform interventions. Due to delays in case reporting, however, forecasting models may often underestimate the current and future disease burden. In this paper, we propose a general framework for addressing reporting delay in disease forecasting efforts with the goal of improving forecasts. We propose strategies for leveraging either historical data on case reporting or external internet-based data to estimate the amount of reporting error. We then describe several approaches for adapting general forecasting pipelines to account for under- or over-reporting of cases. We apply these methods to address reporting delay in data on dengue fever cases in Puerto Rico from 1990 to 2009 and to reports of influenza-like illness (ILI) in the United States between 2010 and 2019. Through a simulation study, we compare method performance and evaluate robustness to assumption violations. Our results show that forecasting accuracy and prediction coverage almost always increase when correction methods are implemented to address reporting delay. Some of these methods required knowledge about the reporting error or high quality external data, which may not always be available. Provided alternatives include excluding recently-reported data and performing sensitivity analysis. This work provides intuition and guidance for handling delay in disease case reporting and may serve as a useful resource to inform practical infectious disease forecasting efforts. The public health community and policymakers are interested in using models to predict future disease rates using information about disease rates in the past. However, our data about the recent past are less reliable than older data, due to a time lag between someone getting sick and their subsequent diagnosis being officially reported. In this paper, we describe strategies to correct reported disease rates from the recent past to account for disease diagnoses that haven’t yet been reported. Using more accurate information about the recent past, we can do a better job predicting what will happen in the future.
Collapse
Affiliation(s)
- Lauren J. Beesley
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
- * E-mail:
| | - Dave Osthus
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Sara Y. Del Valle
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| |
Collapse
|
12
|
Osthus D. Fast and accurate influenza forecasting in the United States with Inferno. PLoS Comput Biol 2022; 18:e1008651. [PMID: 35100253 PMCID: PMC8830797 DOI: 10.1371/journal.pcbi.1008651] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 02/10/2022] [Accepted: 01/02/2022] [Indexed: 01/15/2023] Open
Abstract
Infectious disease forecasting is an emerging field and has the potential to improve public health through anticipatory resource allocation, situational awareness, and mitigation planning. By way of exploring and operationalizing disease forecasting, the U.S. Centers for Disease Control and Prevention (CDC) has hosted FluSight since the 2013/14 flu season, an annual flu forecasting challenge. Since FluSight’s onset, forecasters have developed and improved forecasting models in an effort to provide more timely, reliable, and accurate information about the likely progression of the outbreak. While improving the predictive performance of these forecasting models is often the primary objective, it is also important for a forecasting model to run quickly, facilitating further model development and improvement while providing flexibility when deployed in a real-time setting. In this vein I introduce Inferno, a fast and accurate flu forecasting model inspired by Dante, the top performing model in the 2018/19 FluSight challenge. When pseudoprospectively compared to all models that participated in FluSight 2018/19, Inferno would have placed 2nd in the national and regional challenge as well as the state challenge, behind only Dante. Inferno, however, runs in minutes and is trivially parallelizable, while Dante takes hours to run, representing a significant operational improvement with minimal impact to performance. Forecasting challenges like FluSight should continue to monitor and evaluate how they can be modified and expanded to incentivize the development of forecasting models that benefit public health. Infectious disease forecasting, if accurate, timely, and reliable, can assist decision makers with resource allocation planning in an attempt to curb the negative impacts of an outbreak. Forecasting challenges, like the U.S. Centers for Disease Control and Prevention’s flu forecasting challenge, FluSight, provide a space for teams to develop and operationalize real-time forecasting models that benefit public health, with weekly forecasts made at the state-level, Health and Human Services region-level, and the United States. The ultimate goal of these models is to produce accurate forecasts within the constraints of the forecasting challenge. Having a forecasting model that runs quickly is also important for future scalability, model development, and operational flexibility. In this paper, I present a fast and accurate flu forecasting model, Inferno. Through retrospective comparisons with FluSight-participating models, Inferno was shown to be a leading forecasting model in the field. Inferno, however, runs in minutes not hours, as other leading forecasting models do. This reduction in runtime constitutes an advancement in flu forecasting, positioning Inferno to scale to more granular geographic units, like counties or health care providers.
Collapse
Affiliation(s)
- Dave Osthus
- Statistical Sciences Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
- * E-mail:
| |
Collapse
|
13
|
McAndrew T, Reich NG. Adaptively stacking ensembles for influenza forecasting. Stat Med 2021; 40:6931-6952. [PMID: 34647627 PMCID: PMC8671371 DOI: 10.1002/sim.9219] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 09/13/2021] [Accepted: 09/14/2021] [Indexed: 01/01/2023]
Abstract
Seasonal influenza infects between 10 and 50 million people in the United States every year. Accurate forecasts of influenza and influenza-like illness (ILI) have been named by the CDC as an important tool to fight the damaging effects of these epidemics. Multi-model ensembles make accurate forecasts of seasonal influenza, but current operational ensemble forecasts are static: they require an abundance of past ILI data and assign fixed weights to component models at the beginning of a season, but do not update weights as new data on component model performance is collected. We propose an adaptive ensemble that (i) does not initially need data to combine forecasts and (ii) finds optimal weights which are updated week-by-week throughout the influenza season. We take a regularized likelihood approach and investigate this regularizer's ability to impact adaptive ensemble performance. After finding an optimal regularization value, we compare our adaptive ensemble to an equal-weighted and static ensemble. Applied to forecasts of short-term ILI incidence at the regional and national level, our adaptive model outperforms an equal-weighted ensemble and has similar performance to the static ensemble using only a fraction of the data available to the static ensemble. Needing no data at the beginning of an epidemic, an adaptive ensemble can quickly train and forecast an outbreak, providing a practical tool to public health officials looking for a forecast to conform to unique features of a specific season.
Collapse
Affiliation(s)
- Thomas McAndrew
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts Amherst, Amherst, Massachusetts, United States,College of Health, Lehigh University, Bethlehem, Pennsylvania, United States,Correspondence: Thomas McAndrew, Lehigh University Bethlehem, Pennsylvania, United States of America.
| | - Nicholas G. Reich
- College of Health, Lehigh University, Bethlehem, Pennsylvania, United States
| |
Collapse
|
14
|
Abstract
To study the COVID-19 pandemic, its effects on society, and measures for reducing its spread, researchers need detailed data on the course of the pandemic. Standard public health data streams suffer inconsistent reporting and frequent, unexpected revisions. They also miss other aspects of a population’s behavior that are worthy of consideration. We present an open database of COVID signals in the United States, measured at the county level and updated daily. This includes traditionally reported COVID cases and deaths, and many others: measures of mobility, social distancing, internet search trends, self-reported symptoms, and patterns of COVID-related activity in deidentified medical insurance claims. The database provides all signals in a common, easy-to-use format, empowering both public health research and operational decision-making. The COVID-19 pandemic presented enormous data challenges in the United States. Policy makers, epidemiological modelers, and health researchers all require up-to-date data on the pandemic and relevant public behavior, ideally at fine spatial and temporal resolution. The COVIDcast API is our attempt to fill this need: Operational since April 2020, it provides open access to both traditional public health surveillance signals (cases, deaths, and hospitalizations) and many auxiliary indicators of COVID-19 activity, such as signals extracted from deidentified medical claims data, massive online surveys, cell phone mobility data, and internet search trends. These are available at a fine geographic resolution (mostly at the county level) and are updated daily. The COVIDcast API also tracks all revisions to historical data, allowing modelers to account for the frequent revisions and backfill that are common for many public health data sources. All of the data are available in a common format through the API and accompanying R and Python software packages. This paper describes the data sources and signals, and provides examples demonstrating that the auxiliary signals in the COVIDcast API present information relevant to tracking COVID activity, augmenting traditional public health reporting and empowering research and decision-making.
Collapse
|
15
|
Yang S, Bao Y. Comprehensive learning particle swarm optimization enabled modeling framework for multi-step-ahead influenza prediction. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107994] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
16
|
Sundar S, Schwab P, Tan JZH, Romero-Brufau S, Celi LA, Wangmo D, Penna ND. Forecasting the COVID-19 Pandemic: Lessons learned and future directions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.11.06.21266007. [PMID: 34806093 PMCID: PMC8603143 DOI: 10.1101/2021.11.06.21266007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
I.The Coronavirus Disease 2019 (COVID-19) has demonstrated that accurate forecasts of infection and mortality rates are essential for informing healthcare resource allocation, designing countermeasures, implementing public health policies, and increasing public awareness. However, there exist a multitude of modeling methodologies, and their relative performances in accurately forecasting pandemic dynamics are not currently comprehensively understood. In this paper, we introduce the non-mechanistic MIT-LCP forecasting model, and assess and compare its performance to various mechanistic and non-mechanistic models that have been proposed for forecasting COVID-19 dynamics. We performed a comprehensive experimental evaluation which covered the time period of November 2020 to April 2021, in order to determine the relative performances of MIT-LCP and seven other forecasting models from the United States' Centers for Disease Control and Prevention (CDC) Forecast Hub. Our results show that there exist forecasting scenarios well-suited to both mechanistic and non-mechanistic models, with mechanistic models being particularly performant for forecasts that are further in the future when recent data may not be as informative, and non-mechanistic models being more effective with shorter prediction horizons when recent representative data is available. Improving our understanding of which forecasting approaches are more reliable, and in which forecasting scenarios, can assist effective pandemic preparation and management.
Collapse
|
17
|
Matabuena M, Rodríguez-Mier P, García-Meixide C, Leborán V. COVID-19: Estimation of the transmission dynamics in Spain using a stochastic simulator and black-box optimization techniques. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 211:106399. [PMID: 34607036 PMCID: PMC8418989 DOI: 10.1016/j.cmpb.2021.106399] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 08/31/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVES Epidemiological models of epidemic spread are an essential tool for optimizing decision-making. The current literature is very extensive and covers a wide variety of deterministic and stochastic models. However, with the increase in computing resources, new, more general, and flexible procedures based on simulation models can assess the effectiveness of measures and quantify the current state of the epidemic. This paper illustrates the potential of this approach to build a new dynamic probabilistic model to estimate the prevalence of SARS-CoV-2 infections in different compartments. METHODS We propose a new probabilistic model in which, for the first time in the epidemic literature, parameter learning is carried out using gradient-free stochastic black-box optimization techniques simulating multiple trajectories of the infection dynamics in a general way, solving an inverse problem that is defined employing the daily information from mortality records. RESULTS After the application of the new proposal in Spain in the first and successive waves, the result of the model confirms the accuracy to estimate the seroprevalence and allows us to know the real dynamics of the pandemic a posteriori to assess the impact of epidemiological measures by the Spanish government and to plan more efficiently the subsequent decisions with the prior knowledge obtained. CONCLUSIONS The model results allow us to estimate the daily patterns of COVID-19 infections in Spain retrospectively and examine the population's exposure to the virus dynamically in contrast to seroprevalence surveys. Furthermore, given the flexibility of our simulation framework, we can model situations -even using non-parametric distributions between the different compartments in the model- that other models in the existing literature cannot. Our general optimization strategy remains valid in these cases, and we can easily create other non-standard simulation epidemic models that incorporate more complex and dynamic structures.
Collapse
Affiliation(s)
- Marcos Matabuena
- CiTIUS (Centro Singular de Investigación en Tecnoloxías Intelixentes), Universidade de Santiago of Compostela, Santiago de Compostela, Spain.
| | - Pablo Rodríguez-Mier
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse 31300, France
| | | | - Victor Leborán
- CiTIUS (Centro Singular de Investigación en Tecnoloxías Intelixentes), Universidade de Santiago of Compostela, Santiago de Compostela, Spain
| |
Collapse
|
18
|
Benecke J, Benecke C, Ciutan M, Dosius M, Vladescu C, Olsavszky V. Retrospective analysis and time series forecasting with automated machine learning of ascariasis, enterobiasis and cystic echinococcosis in Romania. PLoS Negl Trop Dis 2021; 15:e0009831. [PMID: 34723982 PMCID: PMC8584970 DOI: 10.1371/journal.pntd.0009831] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 11/11/2021] [Accepted: 09/22/2021] [Indexed: 12/04/2022] Open
Abstract
The epidemiology of neglected tropical diseases (NTD) is persistently underprioritized, despite NTD being widespread among the poorest populations and in the least developed countries on earth. This situation necessitates thorough and efficient public health intervention. Romania is at the brink of becoming a developed country. However, this South-Eastern European country appears to be a region that is susceptible to an underestimated burden of parasitic diseases despite recent public health reforms. Moreover, there is an evident lack of new epidemiologic data on NTD after Romania's accession to the European Union (EU) in 2007. Using the national ICD-10 dataset for hospitalized patients in Romania, we generated time series datasets for 2008-2018. The objective was to gain deep understanding of the epidemiological distribution of three selected and highly endemic parasitic diseases, namely, ascariasis, enterobiasis and cystic echinococcosis (CE), during this period and forecast their courses for the ensuing two years. Through descriptive and inferential analysis, we observed a decline in case numbers for all three NTD. Several distributional particularities at regional level emerged. Furthermore, we performed predictions using a novel automated time series (AutoTS) machine learning tool and could interestingly show a stable course for these parasitic NTD. Such predictions can help public health officials and medical organizations to implement targeted disease prevention and control. To our knowledge, this is the first study involving a retrospective analysis of ascariasis, enterobiasis and CE on a nationwide scale in Romania. It is also the first to use AutoTS technology for parasitic NTD.
Collapse
Affiliation(s)
- Johannes Benecke
- Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, University of Heidelberg, and Center of Excellence in Dermatology, Mannheim, Germany
| | - Cornelius Benecke
- Barcelona Institute for Global Health, University of Barcelona, Barcelona, Spain
| | - Marius Ciutan
- National School of Public Health Management and Professional Development, Bucharest, Romania
| | - Mihnea Dosius
- National School of Public Health Management and Professional Development, Bucharest, Romania
| | - Cristian Vladescu
- National School of Public Health Management and Professional Development, Bucharest, Romania
- University Titu Maiorescu, Faculty of Medicine, Bucharest, Romania
| | - Victor Olsavszky
- Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, University of Heidelberg, and Center of Excellence in Dermatology, Mannheim, Germany
| |
Collapse
|
19
|
Oidtman RJ, Omodei E, Kraemer MUG, Castañeda-Orjuela CA, Cruz-Rivera E, Misnaza-Castrillón S, Cifuentes MP, Rincon LE, Cañon V, Alarcon PD, España G, Huber JH, Hill SC, Barker CM, Johansson MA, Manore CA, Reiner RC, Rodriguez-Barraquer I, Siraj AS, Frias-Martinez E, García-Herranz M, Perkins TA. Trade-offs between individual and ensemble forecasts of an emerging infectious disease. Nat Commun 2021; 12:5379. [PMID: 34508077 PMCID: PMC8433472 DOI: 10.1038/s41467-021-25695-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 08/23/2021] [Indexed: 02/08/2023] Open
Abstract
Probabilistic forecasts play an indispensable role in answering questions about the spread of newly emerged pathogens. However, uncertainties about the epidemiology of emerging pathogens can make it difficult to choose among alternative model structures and assumptions. To assess the potential for uncertainties about emerging pathogens to affect forecasts of their spread, we evaluated the performance 16 forecasting models in the context of the 2015-2016 Zika epidemic in Colombia. Each model featured a different combination of assumptions about human mobility, spatiotemporal variation in transmission potential, and the number of virus introductions. We found that which model assumptions had the most ensemble weight changed through time. We additionally identified a trade-off whereby some individual models outperformed ensemble models early in the epidemic, but on average the ensembles outperformed all individual models. Our results suggest that multiple models spanning uncertainty across alternative assumptions are necessary to obtain robust forecasts for emerging infectious diseases.
Collapse
Affiliation(s)
- Rachel J Oidtman
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA.
- UNICEF, New York, NY, USA.
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA.
| | | | - Moritz U G Kraemer
- Department of Zoology, University of Oxford, Oxford, UK
- Boston Children's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | | | | | | | | | | | | | | - Guido España
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA
| | - John H Huber
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA
| | - Sarah C Hill
- Department of Zoology, University of Oxford, Oxford, UK
- Department of Pathobiology and Population Sciences, The Royal Veterinary College, London, UK
| | - Christopher M Barker
- Department of Pathology, Microbiology, and Immunology, School of Veterinary Medicince, University of California, Davis, CA, USA
| | - Michael A Johansson
- Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, San Juan, Puerto Rico
| | - Carrie A Manore
- Information Systems and Modeling (A-1), Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Robert C Reiner
- Institute for Health Metrics and Evaluation, University of Washington, Seattle, WA, USA
| | | | - Amir S Siraj
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA
| | | | | | - T Alex Perkins
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA.
| |
Collapse
|
20
|
Lu J, Meyer S. An endemic–epidemic beta model for time series of infectious disease proportions. J Appl Stat 2021; 49:3769-3783. [DOI: 10.1080/02664763.2021.1962264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Junyi Lu
- Institute of Medical Informatics, Biometry, and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Sebastian Meyer
- Institute of Medical Informatics, Biometry, and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
21
|
Lee K, Ray J, Safta C. The predictive skill of convolutional neural networks models for disease forecasting. PLoS One 2021; 16:e0254319. [PMID: 34242349 PMCID: PMC8270135 DOI: 10.1371/journal.pone.0254319] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 06/24/2021] [Indexed: 11/18/2022] Open
Abstract
In this paper we investigate the utility of one-dimensional convolutional neural network (CNN) models in epidemiological forecasting. Deep learning models, in particular variants of recurrent neural networks (RNNs) have been studied for ILI (Influenza-Like Illness) forecasting, and have achieved a higher forecasting skill compared to conventional models such as ARIMA. In this study, we adapt two neural networks that employ one-dimensional temporal convolutional layers as a primary building block-temporal convolutional networks and simple neural attentive meta-learners-for epidemiological forecasting. We then test them with influenza data from the US collected over 2010-2019. We find that epidemiological forecasting with CNNs is feasible, and their forecasting skill is comparable to, and at times, superior to, plain RNNs. Thus CNNs and RNNs bring the power of nonlinear transformations to purely data-driven epidemiological models, a capability that heretofore has been limited to more elaborate mechanistic/compartmental disease models.
Collapse
Affiliation(s)
- Kookjin Lee
- Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, AZ, United States of America
- Extreme-Scale Data Science and Analytics, Sandia National Laboratories, Livermore, CA, United States of America
| | - Jaideep Ray
- Extreme-Scale Data Science and Analytics, Sandia National Laboratories, Livermore, CA, United States of America
| | - Cosmin Safta
- Quantitative Modeling and Analysis, Sandia National Laboratories, Livermore, CA, United States of America
| |
Collapse
|
22
|
Turtle J, Riley P, Ben-Nun M, Riley S. Accurate influenza forecasts using type-specific incidence data for small geographic units. PLoS Comput Biol 2021; 17:e1009230. [PMID: 34324487 PMCID: PMC8354478 DOI: 10.1371/journal.pcbi.1009230] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 08/10/2021] [Accepted: 06/30/2021] [Indexed: 11/24/2022] Open
Abstract
Influenza incidence forecasting is used to facilitate better health system planning and could potentially be used to allow at-risk individuals to modify their behavior during a severe seasonal influenza epidemic or a novel respiratory pandemic. For example, the US Centers for Disease Control and Prevention (CDC) runs an annual competition to forecast influenza-like illness (ILI) at the regional and national levels in the US, based on a standard discretized incidence scale. Here, we use a suite of forecasting models to analyze type-specific incidence at the smaller spatial scale of clusters of nearby counties. We used data from point-of-care (POC) diagnostic machines over three seasons, in 10 clusters, capturing: 57 counties; 1,061,891 total specimens; and 173,909 specimens positive for Influenza A. Total specimens were closely correlated with comparable CDC ILI data. Mechanistic models were substantially more accurate when forecasting influenza A positive POC data than total specimen POC data, especially at longer lead times. Also, models that fit subpopulations of the cluster (individual counties) separately were better able to forecast clusters than were models that directly fit to aggregated cluster data. Public health authorities may wish to consider developing forecasting pipelines for type-specific POC data in addition to ILI data. Simple mechanistic models will likely improve forecast accuracy when applied at small spatial scales to pathogen-specific data before being scaled to larger geographical units and broader syndromic data. Highly local forecasts may enable new public health messaging to encourage at-risk individuals to temporarily reduce their social mixing during seasonal peaks and guide public health intervention policy during potentially severe novel influenza pandemics.
Collapse
Affiliation(s)
- James Turtle
- Infectious Disease Group, Predictive Science Inc., San Diego, California, United States
- * E-mail:
| | - Pete Riley
- Infectious Disease Group, Predictive Science Inc., San Diego, California, United States
| | - Michal Ben-Nun
- Infectious Disease Group, Predictive Science Inc., San Diego, California, United States
| | - Steven Riley
- Infectious Disease Group, Predictive Science Inc., San Diego, California, United States
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| |
Collapse
|
23
|
Abstract
Influenza forecasting in the United States (US) is complex and challenging due to spatial and temporal variability, nested geographic scales of interest, and heterogeneous surveillance participation. Here we present Dante, a multiscale influenza forecasting model that learns rather than prescribes spatial, temporal, and surveillance data structure and generates coherent forecasts across state, regional, and national scales. We retrospectively compare Dante's short-term and seasonal forecasts for previous flu seasons to the Dynamic Bayesian Model (DBM), a leading competitor. Dante outperformed DBM for nearly all spatial units, flu seasons, geographic scales, and forecasting targets. Dante's sharper and more accurate forecasts also suggest greater public health utility. Dante placed 1st in the Centers for Disease Control and Prevention's prospective 2018/19 FluSight challenge in both the national and regional competition and the state competition. The methodology underpinning Dante can be used in other seasonal disease forecasting contexts having nested geographic scales of interest.
Collapse
Affiliation(s)
- Dave Osthus
- Los Alamos National Laboratory, Statistical Sciences Group, Los Alamos, NM, USA.
| | - Kelly R Moran
- Los Alamos National Laboratory, Statistical Sciences Group, Los Alamos, NM, USA.,Department of Statistical Science, Duke University, Durham, NC, USA
| |
Collapse
|
24
|
Chowell G, Luo R. Ensemble bootstrap methodology for forecasting dynamic growth processes using differential equations: application to epidemic outbreaks. BMC Med Res Methodol 2021; 21:34. [PMID: 33583405 PMCID: PMC7882252 DOI: 10.1186/s12874-021-01226-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 01/18/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Ensemble modeling aims to boost the forecasting performance by systematically integrating the predictive accuracy across individual models. Here we introduce a simple-yet-powerful ensemble methodology for forecasting the trajectory of dynamic growth processes that are defined by a system of non-linear differential equations with applications to infectious disease spread. METHODS We propose and assess the performance of two ensemble modeling schemes with different parametric bootstrapping procedures for trajectory forecasting and uncertainty quantification. Specifically, we conduct sequential probabilistic forecasts to evaluate their forecasting performance using simple dynamical growth models with good track records including the Richards model, the generalized-logistic growth model, and the Gompertz model. We first test and verify the functionality of the method using simulated data from phenomenological models and a mechanistic transmission model. Next, the performance of the method is demonstrated using a diversity of epidemic datasets including scenario outbreak data of the Ebola Forecasting Challenge and real-world epidemic data outbreaks of including influenza, plague, Zika, and COVID-19. RESULTS We found that the ensemble method that randomly selects a model from the set of individual models for each time point of the trajectory of the epidemic frequently outcompeted the individual models as well as an alternative ensemble method based on the weighted combination of the individual models and yields broader and more realistic uncertainty bounds for the trajectory envelope, achieving not only better coverage rate of the 95% prediction interval but also improved mean interval scores across a diversity of epidemic datasets. CONCLUSION Our new methodology for ensemble forecasting outcompete component models and an alternative ensemble model that differ in how the variance is evaluated for the generation of the prediction intervals of the forecasts.
Collapse
Affiliation(s)
- Gerardo Chowell
- Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA.
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, MD, USA.
| | - Ruiyan Luo
- Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA
| |
Collapse
|
25
|
Forecasting the Patients Flow at Pediatric Emergency Departments. J Med Syst 2021; 45:29. [PMID: 33506300 DOI: 10.1007/s10916-021-01712-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 01/20/2021] [Indexed: 10/22/2022]
Abstract
Emergency departments (EDs) have a key role in the public health system. They are facing a constant growth of their volume. Forecasting the daily volume is a major tool to adapt the allocation of resources. In this paper, we focus on pediatric EDs. They are specific by their strong seasonal variation, determined by the academic pace. The main contribution of this paper is to integrate the effects of this pace to the annual seasonality. We also tried out to improve the daily forecasting by forecasting the week means of the flow first. We trained and tested these models specifically on the pediatric EDs of Paris university hospital trust. For the eight pediatric EDs gathered, on average for the years 2016 to 2019, we forecasted the daily volume with a Mean Absolute Percentage Error (MAPE) of 6.6% for a 7-days forecasting, 7.1% for a 14-days forecasting and 7.6% for a 28-days forecasting. Account of rhythm allows a performance increase, with results respectively 7%, 10.1% and 8.4% better relatively to a baseline model based on a periodic regression on the weeks.
Collapse
|
26
|
Gibson GC, Moran KR, Reich NG, Osthus D. Improving probabilistic infectious disease forecasting through coherence. PLoS Comput Biol 2021; 17:e1007623. [PMID: 33406068 PMCID: PMC7837472 DOI: 10.1371/journal.pcbi.1007623] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 01/26/2021] [Accepted: 09/14/2020] [Indexed: 11/19/2022] Open
Abstract
With an estimated $10.4 billion in medical costs and 31.4 million outpatient visits each year, influenza poses a serious burden of disease in the United States. To provide insights and advance warning into the spread of influenza, the U.S. Centers for Disease Control and Prevention (CDC) runs a challenge for forecasting weighted influenza-like illness (wILI) at the national and regional level. Many models produce independent forecasts for each geographical unit, ignoring the constraint that the national wILI is a weighted sum of regional wILI, where the weights correspond to the population size of the region. We propose a novel algorithm that transforms a set of independent forecast distributions to obey this constraint, which we refer to as probabilistically coherent. Enforcing probabilistic coherence led to an increase in forecast skill for 79% of the models we tested over multiple flu seasons, highlighting the importance of respecting the forecasting system’s geographical hierarchy. Seasonal influenza causes a significant public health burden nationwide. Accurate influenza forecasting may help public health officials allocate resources and plan responses to emerging outbreaks. The U.S. Centers for Disease Control and Prevention (CDC) reports influenza data at multiple geographical units, including regionally and nationally, where the national data are by construction a weighted sum of the regional data. In an effort to improve influenza forecast accuracy across all models submitted to the CDC’s annual flu forecasting challenge, we examined the effect of imposing this geographical constraint on the set of independent forecasts, made publicly available by the CDC. We developed a novel method to transform forecast densities to obey the geographical constraint that respects the correlation structure between geographical units. This method showed consistent improvement across 79% of models and that held when stratified by targets and test seasons. Our method can be applied to other forecasting systems both within and outside an infectious disease context that have a geographical hierarchy.
Collapse
Affiliation(s)
- Graham Casey Gibson
- Statistical Sciences Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
- Department of Biostatistics and Epidemiology, University of Massachusetts-Amherst, Amherst, Massachusetts, United States of America
- * E-mail:
| | - Kelly R. Moran
- Statistical Sciences Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
- Department of Statistical Science, Duke University, Durham, North Carolina, United States of America
| | - Nicholas G. Reich
- Department of Biostatistics and Epidemiology, University of Massachusetts-Amherst, Amherst, Massachusetts, United States of America
| | - Dave Osthus
- Statistical Sciences Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| |
Collapse
|
27
|
Alahmadi A, Belet S, Black A, Cromer D, Flegg JA, House T, Jayasundara P, Keith JM, McCaw JM, Moss R, Ross JV, Shearer FM, Tun STT, Walker CR, White L, Whyte JM, Yan AWC, Zarebski AE. Influencing public health policy with data-informed mathematical models of infectious diseases: Recent developments and new challenges. Epidemics 2020; 32:100393. [PMID: 32674025 DOI: 10.1016/j.epidem.2020.100393] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Accepted: 04/25/2020] [Indexed: 12/16/2022] Open
Abstract
Modern data and computational resources, coupled with algorithmic and theoretical advances to exploit these, allow disease dynamic models to be parameterised with increasing detail and accuracy. While this enhances models' usefulness in prediction and policy, major challenges remain. In particular, lack of identifiability of a model's parameters may limit the usefulness of the model. While lack of parameter identifiability may be resolved through incorporation into an inference procedure of prior knowledge, formulating such knowledge is often difficult. Furthermore, there are practical challenges associated with acquiring data of sufficient quantity and quality. Here, we discuss recent progress on these issues.
Collapse
Affiliation(s)
- Amani Alahmadi
- School of Mathematics, Faculty of Science, Monash University, Melbourne, Australia
| | - Sarah Belet
- School of Mathematics, Faculty of Science, Monash University, Melbourne, Australia; Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS)
| | - Andrew Black
- School of Mathematical Sciences, University of Adelaide, Adelaide, Australia; Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS)
| | - Deborah Cromer
- Kirby Institute for Infection and Immunity, UNSW Sydney, Sydney, Australia and School of Mathematics and Statistics, UNSW Sydney, Sydney, Australia
| | - Jennifer A Flegg
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia.
| | - Thomas House
- Department of Mathematics, University of Manchester, Manchester, UK; IBM Research, Hartree Centre, Sci-Tech Daresbury, Warrington, UK.
| | | | - Jonathan M Keith
- School of Mathematics, Faculty of Science, Monash University, Melbourne, Australia; Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS)
| | - James M McCaw
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia; Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, Australia.
| | - Robert Moss
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, Australia
| | - Joshua V Ross
- School of Mathematical Sciences, University of Adelaide, Adelaide, Australia; Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS).
| | - Freya M Shearer
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, Australia
| | - Sai Thein Than Tun
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, UK
| | - Camelia R Walker
- School of Mathematical Sciences, University of Adelaide, Adelaide, Australia
| | - Lisa White
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, UK
| | - Jason M Whyte
- Centre of Excellence for Biosecurity Risk Analysis (CEBRA), School of BioSciences, University of Melbourne, Melbourne, Australia; Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS)
| | - Ada W C Yan
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK
| | | |
Collapse
|
28
|
Time Series Analysis and Forecasting with Automated Machine Learning on a National ICD-10 Database. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17144979. [PMID: 32664331 PMCID: PMC7400312 DOI: 10.3390/ijerph17144979] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 06/29/2020] [Accepted: 07/07/2020] [Indexed: 12/22/2022]
Abstract
The application of machine learning (ML) for use in generating insights and making predictions on new records continues to expand within the medical community. Despite this progress to date, the application of time series analysis has remained underexplored due to complexity of the underlying techniques. In this study, we have deployed a novel ML, called automated time series (AutoTS) machine learning, to automate data processing and the application of a multitude of models to assess which best forecasts future values. This rapid experimentation allows for and enables the selection of the most accurate model in order to perform time series predictions. By using the nation-wide ICD-10 (International Classification of Diseases, Tenth Revision) dataset of hospitalized patients of Romania, we have generated time series datasets over the period of 2008–2018 and performed highly accurate AutoTS predictions for the ten deadliest diseases. Forecast results for the years 2019 and 2020 were generated on a NUTS 2 (Nomenclature of Territorial Units for Statistics) regional level. This is the first study to our knowledge to perform time series forecasting of multiple diseases at a regional level using automated time series machine learning on a national ICD-10 dataset. The deployment of AutoTS technology can help decision makers in implementing targeted national health policies more efficiently.
Collapse
|
29
|
Xu B, Gutierrez B, Mekaru S, Sewalk K, Goodwin L, Loskill A, Cohn EL, Hswen Y, Hill SC, Cobo MM, Zarebski AE, Li S, Wu CH, Hulland E, Morgan JD, Wang L, O'Brien K, Scarpino SV, Brownstein JS, Pybus OG, Pigott DM, Kraemer MUG. Epidemiological data from the COVID-19 outbreak, real-time case information. Sci Data 2020. [PMID: 32210236 DOI: 10.1038/s41597-020-0448-0ss] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2023] Open
Abstract
Cases of a novel coronavirus were first reported in Wuhan, Hubei province, China, in December 2019 and have since spread across the world. Epidemiological studies have indicated human-to-human transmission in China and elsewhere. To aid the analysis and tracking of the COVID-19 epidemic we collected and curated individual-level data from national, provincial, and municipal health reports, as well as additional information from online reports. All data are geo-coded and, where available, include symptoms, key dates (date of onset, admission, and confirmation), and travel history. The generation of detailed, real-time, and robust data for emerging disease outbreaks is important and can help to generate robust evidence that will support and inform public health decision making.
Collapse
Affiliation(s)
- Bo Xu
- Ministry of Education Key Laboratory for Earth System Modeling, Department of Earth System Science, Tsinghua University, Beijing, China
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Bernardo Gutierrez
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- School of Biological and Environmental Sciences, Universidad San Francisco de Quito USFQ, Quito, Ecuador
| | - Sumiko Mekaru
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
- Booz Allen Hamilton, Westborough Massachusetts, United States
| | - Kara Sewalk
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
| | - Lauren Goodwin
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
| | - Alyssa Loskill
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
- School of Public Health, Boston University, Boston, United States
| | - Emily L Cohn
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
| | - Yulin Hswen
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
| | - Sarah C Hill
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Maria M Cobo
- School of Biological and Environmental Sciences, Universidad San Francisco de Quito USFQ, Quito, Ecuador
- Department of Paediatrics, University of Oxford, Oxford, United Kingdom
| | | | - Sabrina Li
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- School of Geography and the Environment, University of Oxford, Oxford, United Kingdom
| | - Chieh-Hsi Wu
- Mathematical Sciences, University of Southampton, Southampton, United Kingdom
| | - Erin Hulland
- Department of Health Metrics Sciences, University of Washington, Seattle, United States
- Institute for Health Metrics and Evaluation, University of Washington, Seattle, United States
| | - Julia D Morgan
- Department of Health Metrics Sciences, University of Washington, Seattle, United States
- Institute for Health Metrics and Evaluation, University of Washington, Seattle, United States
| | - Lin Wang
- Mathematical Modelling of Infectious Diseases Unit, Institut Pasteur, UMR2000, CNRS, Paris, France
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Katelynn O'Brien
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
| | - Samuel V Scarpino
- Network Science Institute, Northeastern University, Boston, United States
| | - John S Brownstein
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
- Department of Pediatrics, Harvard Medical School, Boston, United States
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - David M Pigott
- Department of Health Metrics Sciences, University of Washington, Seattle, United States.
- Institute for Health Metrics and Evaluation, University of Washington, Seattle, United States.
| | - Moritz U G Kraemer
- Department of Zoology, University of Oxford, Oxford, United Kingdom.
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States.
- Department of Pediatrics, Harvard Medical School, Boston, United States.
| |
Collapse
|
30
|
Xu B, Gutierrez B, Mekaru S, Sewalk K, Goodwin L, Loskill A, Cohn EL, Hswen Y, Hill SC, Cobo MM, Zarebski AE, Li S, Wu CH, Hulland E, Morgan JD, Wang L, O'Brien K, Scarpino SV, Brownstein JS, Pybus OG, Pigott DM, Kraemer MUG. Epidemiological data from the COVID-19 outbreak, real-time case information. Sci Data 2020; 7:106. [PMID: 32210236 PMCID: PMC7093412 DOI: 10.1038/s41597-020-0448-0] [Citation(s) in RCA: 198] [Impact Index Per Article: 39.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 03/12/2020] [Indexed: 11/12/2022] Open
Abstract
Cases of a novel coronavirus were first reported in Wuhan, Hubei province, China, in December 2019 and have since spread across the world. Epidemiological studies have indicated human-to-human transmission in China and elsewhere. To aid the analysis and tracking of the COVID-19 epidemic we collected and curated individual-level data from national, provincial, and municipal health reports, as well as additional information from online reports. All data are geo-coded and, where available, include symptoms, key dates (date of onset, admission, and confirmation), and travel history. The generation of detailed, real-time, and robust data for emerging disease outbreaks is important and can help to generate robust evidence that will support and inform public health decision making.
Collapse
Affiliation(s)
- Bo Xu
- Ministry of Education Key Laboratory for Earth System Modeling, Department of Earth System Science, Tsinghua University, Beijing, China
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Bernardo Gutierrez
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- School of Biological and Environmental Sciences, Universidad San Francisco de Quito USFQ, Quito, Ecuador
| | - Sumiko Mekaru
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
- Booz Allen Hamilton, Westborough Massachusetts, United States
| | - Kara Sewalk
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
| | - Lauren Goodwin
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
| | - Alyssa Loskill
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
- School of Public Health, Boston University, Boston, United States
| | - Emily L Cohn
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
| | - Yulin Hswen
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
| | - Sarah C Hill
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Maria M Cobo
- School of Biological and Environmental Sciences, Universidad San Francisco de Quito USFQ, Quito, Ecuador
- Department of Paediatrics, University of Oxford, Oxford, United Kingdom
| | | | - Sabrina Li
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- School of Geography and the Environment, University of Oxford, Oxford, United Kingdom
| | - Chieh-Hsi Wu
- Mathematical Sciences, University of Southampton, Southampton, United Kingdom
| | - Erin Hulland
- Department of Health Metrics Sciences, University of Washington, Seattle, United States
- Institute for Health Metrics and Evaluation, University of Washington, Seattle, United States
| | - Julia D Morgan
- Department of Health Metrics Sciences, University of Washington, Seattle, United States
- Institute for Health Metrics and Evaluation, University of Washington, Seattle, United States
| | - Lin Wang
- Mathematical Modelling of Infectious Diseases Unit, Institut Pasteur, UMR2000, CNRS, Paris, France
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Katelynn O'Brien
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
| | - Samuel V Scarpino
- Network Science Institute, Northeastern University, Boston, United States
| | - John S Brownstein
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States
- Department of Pediatrics, Harvard Medical School, Boston, United States
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - David M Pigott
- Department of Health Metrics Sciences, University of Washington, Seattle, United States.
- Institute for Health Metrics and Evaluation, University of Washington, Seattle, United States.
| | - Moritz U G Kraemer
- Department of Zoology, University of Oxford, Oxford, United Kingdom.
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, United States.
- Department of Pediatrics, Harvard Medical School, Boston, United States.
| |
Collapse
|
31
|
Lu J, Meyer S. Forecasting Flu Activity in the United States: Benchmarking an Endemic-Epidemic Beta Model. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E1381. [PMID: 32098038 PMCID: PMC7068443 DOI: 10.3390/ijerph17041381] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 02/07/2020] [Accepted: 02/15/2020] [Indexed: 11/25/2022]
Abstract
Accurate prediction of flu activity enables health officials to plan disease prevention and allocate treatment resources. A promising forecasting approach is to adapt the well-established endemic-epidemic modeling framework to time series of infectious disease proportions. Using U.S. influenza-like illness surveillance data over 18 seasons, we assessed probabilistic forecasts of this new beta autoregressive model with proper scoring rules. Other readily available forecasting tools were used for comparison, including Prophet, (S)ARIMA and kernel conditional density estimation (KCDE). Short-term flu activity was equally well predicted up to four weeks ahead by the beta model with four autoregressive lags and by KCDE; however, the beta model runs much faster. Non-dynamic Prophet scored worst. Relative performance differed for seasonal peak prediction. Prophet produced the best peak intensity forecasts in seasons with standard epidemic curves; otherwise, KCDE outperformed all other methods. Peak timing was best predicted by SARIMA, KCDE or the beta model, depending on the season. The best overall performance when predicting peak timing and intensity was achieved by KCDE. Only KCDE and naive historical forecasts consistently outperformed the equal-bin reference approach for all test seasons. We conclude that the endemic-epidemic beta model is a performant and easy-to-implement tool to forecast flu activity a few weeks ahead. Real-time forecasting of the seasonal peak, however, should consider outputs of multiple models simultaneously, weighing their usefulness as the season progresses.
Collapse
Affiliation(s)
| | - Sebastian Meyer
- Institute of Medical Informatics, Biometry, and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany;
| |
Collapse
|
32
|
Long JB, Ehrenfeld JM. The Role of Augmented Intelligence (AI) in Detecting and Preventing the Spread of Novel Coronavirus. J Med Syst 2020; 44:59. [PMID: 32020374 PMCID: PMC7088294 DOI: 10.1007/s10916-020-1536-6] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Justin B. Long
- Children’s Healthcare of Atlanta at Egleston, Emory University, Atlanta, GA USA
| | - Jesse M. Ehrenfeld
- Advancing a Healthier Wisconsin Endowment, Medical College of Wisconsin, Milwaukee, WI USA
| |
Collapse
|
33
|
Darwish A, Rahhal Y, Jafar A. A comparative study on predicting influenza outbreaks using different feature spaces: application of influenza-like illness data from Early Warning Alert and Response System in Syria. BMC Res Notes 2020; 13:33. [PMID: 31948473 PMCID: PMC6964210 DOI: 10.1186/s13104-020-4889-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Accepted: 01/03/2020] [Indexed: 11/10/2022] Open
Abstract
Objective An accurate forecasting of outbreaks of influenza-like illness (ILI) could support public health officials to suggest public health actions earlier. We investigated the performance of three different feature spaces in different models to forecast the weekly ILI rate in Syria using EWARS data from World Health Organization (WHO). Time series feature space was first used and we applied the seven models which are Naïve, Average, Seasonal naïve, drift, dynamic harmonic regression (Dhr), seasonal and trend decomposition using loess (STL) and TBATS. The Second feature space is like some state-of-the-art, which we named \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$53-weeks-before\_52-first-order-difference$$\end{document}53-weeks-before_52-first-order-difference feature space. The third one, we proposed and named \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$n-years-before\_m-weeks-around$$\end{document}n-years-before_m-weeks-around (YnWm) feature space. Machine learning (ML) and deep learning (DL) model were applied to the second and third feature spaces (generalized linear model (GLM), support vector regression (SVR), gradient boosting (GB), random forest (RF) and long short term memory (LSTM)). Results It was indicated that the LSTM model of four layers with \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$1-year-before\_4-weeks-around$$\end{document}1-year-before_4-weeks-around feature space gave more accurate results than other models and reached the lowest MAPE of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$3.52\%$$\end{document}3.52% and the lowest RMSE of 0.01662. I hope that this modelling methodology can be applied in other countries and therefore help prevent and control influenza worldwide.
Collapse
Affiliation(s)
- Ali Darwish
- Department of Informatics, Higher Institute for Applied Sciences and Technology, Damascus, Syria.
| | - Yasser Rahhal
- Department of Informatics, Higher Institute for Applied Sciences and Technology, Damascus, Syria
| | - Assef Jafar
- Department of Informatics, Higher Institute for Applied Sciences and Technology, Damascus, Syria
| |
Collapse
|
34
|
Chowell G, Luo R, Sun K, Roosa K, Tariq A, Viboud C. Real-time forecasting of epidemic trajectories using computational dynamic ensembles. Epidemics 2019; 30:100379. [PMID: 31887571 DOI: 10.1016/j.epidem.2019.100379] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Revised: 11/24/2019] [Accepted: 11/25/2019] [Indexed: 12/20/2022] Open
Abstract
Forecasting the trajectory of social dynamic processes, such as the spread of infectious diseases, poses significant challenges that call for methods that account for data and model uncertainty. Here we introduce an ensemble model for sequential forecasting that weights a set of plausible models and use a frequentist computational bootstrap approach to evaluate its uncertainty. We demonstrate the feasibility of our approach using simple dynamic differential-equation models and the trajectory of outbreak scenarios of the Ebola Forecasting Challenge. Specifically, we generate sequential short-term forecasts of epidemic outbreaks by combining phenomenological models that incorporate flexible epidemic growth scaling, namely the Generalized-Growth Model (GGM) and the Generalized Logistic Model (GLM). We rely on the root-mean-square error (RMSE) to quantify the quality of the models' fits during the calibration periods for weighting their contribution to the ensemble model while forecasting performance was evaluated using the RMSE of the forecasts. For a given forecasting horizon (1-4 weeks), we report the performance for each model as the percentage of the number of times each model outperforms the other models. The overall mean RMSE performance of the GLM and the GGM-GLM ensemble models outcompeted that of participant models of the Ebola Forecasting Challenge. We also found that the ensemble model provided more accurate forecasts with higher frequency than the GGM and GLM models, but its performance varied across forecasting horizons. For instance, across all of the Ebola Challenge Scenarios, the ensemble model outperformed the other models at horizons of 2 and 3 weeks while the GLM outperformed other models at horizons of 1 and 4 weeks.
Collapse
Affiliation(s)
- G Chowell
- Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA; Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, MD, USA.
| | - R Luo
- Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA
| | - K Sun
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, MD, USA
| | - K Roosa
- Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA
| | - A Tariq
- Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA
| | - C Viboud
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
35
|
Rangarajan P, Mody SK, Marathe M. Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data. PLoS Comput Biol 2019; 15:e1007518. [PMID: 31751346 PMCID: PMC6894887 DOI: 10.1371/journal.pcbi.1007518] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 12/05/2019] [Accepted: 10/29/2019] [Indexed: 12/20/2022] Open
Abstract
Dengue and influenza-like illness (ILI) are two of the leading causes of viral infection in the world and it is estimated that more than half the world’s population is at risk for developing these infections. It is therefore important to develop accurate methods for forecasting dengue and ILI incidences. Since data from multiple sources (such as dengue and ILI case counts, electronic health records and frequency of multiple internet search terms from Google Trends) can improve forecasts, standard time series analysis methods are inadequate to estimate all the parameter values from the limited amount of data available if we use multiple sources. In this paper, we use a computationally efficient implementation of the known variable selection method that we call the Autoregressive Likelihood Ratio (ARLR) method. This method combines sparse representation of time series data, electronic health records data (for ILI) and Google Trends data to forecast dengue and ILI incidences. This sparse representation method uses an algorithm that maximizes an appropriate likelihood ratio at every step. Using numerical experiments, we demonstrate that our method recovers the underlying sparse model much more accurately than the lasso method. We apply our method to dengue case count data from five countries/states: Brazil, Mexico, Singapore, Taiwan, and Thailand and to ILI case count data from the United States. Numerical experiments show that our method outperforms existing time series forecasting methods in forecasting the dengue and ILI case counts. In particular, our method gives a 18 percent forecast error reduction over a leading method that also uses data from multiple sources. It also performs better than other methods in predicting the peak value of the case count and the peak time. Dengue and influenza-like illness (ILI) are leading causes of viral infection in the world and hence it is important to develop accurate methods for forecasting their incidence. We use Autoregressive Likelihood Ratio method, which is a computationally efficient implementation of the variable selection method, in order to obtain a sparse (non-lasso) representation of time series, Google Trends and electronic health records (for ILI) data. This method is used to forecast dengue incidence in five countries/states and ILI incidence in USA. We show that this method outperforms existing time series methods in forecasting these diseases. The method is general and can also be used to forecast other diseases.
Collapse
Affiliation(s)
- Prashant Rangarajan
- Departments of Computer Science and Mathematics, Birla Institute of Technology and Science, Pilani, India
| | - Sandeep K. Mody
- Department of Mathematics, Indian Institute of Science, Bangalore, India
| | - Madhav Marathe
- Department of Computer Science, Network, Simulation Science and Advanced Computing Division, Biocomplexity Institute, University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail:
| |
Collapse
|
36
|
Reich NG, McGowan CJ, Yamana TK, Tushar A, Ray EL, Osthus D, Kandula S, Brooks LC, Crawford-Crudell W, Gibson GC, Moore E, Silva R, Biggerstaff M, Johansson MA, Rosenfeld R, Shaman J. Accuracy of real-time multi-model ensemble forecasts for seasonal influenza in the U.S. PLoS Comput Biol 2019; 15:e1007486. [PMID: 31756193 PMCID: PMC6897420 DOI: 10.1371/journal.pcbi.1007486] [Citation(s) in RCA: 97] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 12/06/2019] [Accepted: 10/14/2019] [Indexed: 11/19/2022] Open
Abstract
Seasonal influenza results in substantial annual morbidity and mortality in the United States and worldwide. Accurate forecasts of key features of influenza epidemics, such as the timing and severity of the peak incidence in a given season, can inform public health response to outbreaks. As part of ongoing efforts to incorporate data and advanced analytical methods into public health decision-making, the United States Centers for Disease Control and Prevention (CDC) has organized seasonal influenza forecasting challenges since the 2013/2014 season. In the 2017/2018 season, 22 teams participated. A subset of four teams created a research consortium called the FluSight Network in early 2017. During the 2017/2018 season they worked together to produce a collaborative multi-model ensemble that combined 21 separate component models into a single model using a machine learning technique called stacking. This approach creates a weighted average of predictive densities where the weight for each component is determined by maximizing overall ensemble accuracy over past seasons. In the 2017/2018 influenza season, one of the largest seasonal outbreaks in the last 15 years, this multi-model ensemble performed better on average than all individual component models and placed second overall in the CDC challenge. It also outperformed the baseline multi-model ensemble created by the CDC that took a simple average of all models submitted to the forecasting challenge. This project shows that collaborative efforts between research teams to develop ensemble forecasting approaches can bring measurable improvements in forecast accuracy and important reductions in the variability of performance from year to year. Efforts such as this, that emphasize real-time testing and evaluation of forecasting models and facilitate the close collaboration between public health officials and modeling researchers, are essential to improving our understanding of how best to use forecasts to improve public health response to seasonal and emerging epidemic threats.
Collapse
Affiliation(s)
- Nicholas G. Reich
- Department of Biostatistics and Epidemiology, University of Massachusetts-Amherst, Amherst, Massachusetts, United States of America
| | - Craig J. McGowan
- Influenza Division, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Teresa K. Yamana
- Department of Environmental Health Sciences, Columbia University, New York, New York, United States of America
| | - Abhinav Tushar
- School of Computer Science, University of Massachusetts-Amherst, Amherst, Massachusetts, United States of America
| | - Evan L. Ray
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, Massachusetts, United States of America
| | - Dave Osthus
- Statistical Sciences Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Sasikiran Kandula
- Department of Environmental Health Sciences, Columbia University, New York, New York, United States of America
| | - Logan C. Brooks
- Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Willow Crawford-Crudell
- Department of Mathematics and Statistics, Smith College, Northampton, Massachusetts, United States of America
| | - Graham Casey Gibson
- Department of Biostatistics and Epidemiology, University of Massachusetts-Amherst, Amherst, Massachusetts, United States of America
| | - Evan Moore
- Department of Biostatistics and Epidemiology, University of Massachusetts-Amherst, Amherst, Massachusetts, United States of America
| | - Rebecca Silva
- Department of Mathematics and Statistics, Amherst College, Amherst, Massachusetts, United States of America
| | - Matthew Biggerstaff
- Influenza Division, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Michael A. Johansson
- Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, San Juan, Puerto Rico, United States of America
| | - Roni Rosenfeld
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Jeffrey Shaman
- Department of Environmental Health Sciences, Columbia University, New York, New York, United States of America
| |
Collapse
|
37
|
Estimating influenza incidence using search query deceptiveness and generalized ridge regression. PLoS Comput Biol 2019; 15:e1007165. [PMID: 31574086 PMCID: PMC6771994 DOI: 10.1371/journal.pcbi.1007165] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 05/31/2019] [Indexed: 11/22/2022] Open
Abstract
Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically-selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additional data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates. While often considered a minor infection, seasonal flu kills many thousands of people every year and sickens millions more. The more accurate and up-to-date public health officials’ view of what the seasonal outbreak is, the more effectively the outbreak can be addressed. Currently, this knowledge is based on collating information on patients who enter the health care system. This approach is accurate, but it’s also expensive and slow. Researchers hope that new approaches based on examining what people do and share on the internet may work more cheaply and quickly. Some internet activity, however, has a history of correspondence with disease activity, but this relationship is coincidental rather than informative. For example, some prior work has found a correspondence between zombie-related social media messages and the flu season, so one could plausibly build accurate flu estimates using such messages that are then fooled by the appearance of a new zombie movie. We tested flu estimation models that incorporate information about this risk of deception, finding that knowledge of deceptiveness does indeed produce more accurate estimates; we also identified a method to estimate deceptiveness. Our results suggest that estimation models used in practice should use information about both how inputs maps to disease activity and also what the potential of each input to be deceptive is. This may get us one step closer to accurate, reliable disease estimates based on internet data, which would improve public health by making those estimates faster and cheaper.
Collapse
|
38
|
On the multibin logarithmic score used in the FluSight competitions. Proc Natl Acad Sci U S A 2019; 116:20809-20810. [PMID: 31558612 DOI: 10.1073/pnas.1912147116] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
39
|
Kandula S, Shaman J. Reappraising the utility of Google Flu Trends. PLoS Comput Biol 2019; 15:e1007258. [PMID: 31374088 PMCID: PMC6693776 DOI: 10.1371/journal.pcbi.1007258] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 08/14/2019] [Accepted: 07/09/2019] [Indexed: 11/18/2022] Open
Abstract
Estimation of influenza-like illness (ILI) using search trends activity was intended to supplement traditional surveillance systems, and was a motivation behind the development of Google Flu Trends (GFT). However, several studies have previously reported large errors in GFT estimates of ILI in the US. Following recent release of time-stamped surveillance data, which better reflects real-time operational scenarios, we reanalyzed GFT errors. Using three data sources-GFT: an archive of weekly ILI estimates from Google Flu Trends; ILIf: fully-observed ILI rates from ILINet; and, ILIp: ILI rates available in real-time based on partial reporting-five influenza seasons were analyzed and mean square errors (MSE) of GFT and ILIp as estimates of ILIf were computed. To correct GFT errors, a random forest regression model was built with ILI and GFT rates from the previous three weeks as predictors. An overall reduction in error of 44% was observed and the errors of the corrected GFT are lower than those of ILIp. An 80% reduction in error during 2012/13, when GFT had large errors, shows that extreme failures of GFT could have been avoided. Using autoregressive integrated moving average (ARIMA) models, one- to four-week ahead forecasts were generated with two separate data streams: ILIp alone, and with both ILIp and corrected GFT. At all forecast targets and seasons, and for all but two regions, inclusion of GFT lowered MSE. Results from two alternative error measures, mean absolute error and mean absolute proportional error, were largely consistent with results from MSE. Taken together these findings provide an error profile of GFT in the US, establish strong evidence for the adoption of search trends based 'nowcasts' in influenza forecast systems, and encourage reevaluation of the utility of this data source in diverse domains.
Collapse
Affiliation(s)
- Sasikiran Kandula
- Department of Environmental Health Sciences, Columbia University, New York, New York, United States of America
| | - Jeffrey Shaman
- Department of Environmental Health Sciences, Columbia University, New York, New York, United States of America
| |
Collapse
|
40
|
Ben-Nun M, Riley P, Turtle J, Bacon DP, Riley S. Forecasting national and regional influenza-like illness for the USA. PLoS Comput Biol 2019; 15:e1007013. [PMID: 31120881 PMCID: PMC6557527 DOI: 10.1371/journal.pcbi.1007013] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 06/10/2019] [Accepted: 04/09/2019] [Indexed: 01/16/2023] Open
Abstract
Health planners use forecasts of key metrics associated with influenza-like illness (ILI); near-term weekly incidence, week of season onset, week of peak, and intensity of peak. Here, we describe our participation in a weekly prospective ILI forecasting challenge for the United States for the 2016-17 season and subsequent evaluation of our performance. We implemented a metapopulation model framework with 32 model variants. Variants differed from each other in their assumptions about: the force-of-infection (FOI); use of uninformative priors; the use of discounted historical data for not-yet-observed time points; and the treatment of regions as either independent or coupled. Individual model variants were chosen subjectively as the basis for our weekly forecasts; however, a subset of coupled models were only available part way through the season. Most frequently, during the 2016-17 season, we chose; FOI variants with both school vacations and humidity terms; uninformative priors; the inclusion of discounted historical data for not-yet-observed time points; and coupled regions (when available). Our near-term weekly forecasts substantially over-estimated incidence early in the season when coupled models were not available. However, our forecast accuracy improved in absolute terms and relative to other teams once coupled solutions were available. In retrospective analysis, we found that the 2016-17 season was not typical: on average, coupled models performed better when fit without historically augmented data. Also, we tested a simple ensemble model for the 2016-17 season and found that it underperformed our subjective choice for all forecast targets. In this study, we were able to improve accuracy during a prospective forecasting exercise by coupling dynamics between regions. Although reduction of forecast subjectivity should be a long-term goal, some degree of human intervention is likely to improve forecast accuracy in the medium-term in parallel with the systematic consideration of more sophisticated ensemble approaches.
Collapse
Affiliation(s)
- Michal Ben-Nun
- Predictive Science Inc., San Diego, CA, USA
- * E-mail: (MBN); (SR)
| | - Pete Riley
- Predictive Science Inc., San Diego, CA, USA
| | | | | | - Steven Riley
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, UK
- * E-mail: (MBN); (SR)
| |
Collapse
|
41
|
Drake JM, Brett TS, Chen S, Epureanu BI, Ferrari MJ, Marty É, Miller PB, O’Dea EB, O’Regan SM, Park AW, Rohani P. The statistics of epidemic transitions. PLoS Comput Biol 2019; 15:e1006917. [PMID: 31067217 PMCID: PMC6505855 DOI: 10.1371/journal.pcbi.1006917] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Emerging and re-emerging pathogens exhibit very complex dynamics, are hard to model and difficult to predict. Their dynamics might appear intractable. However, new statistical approaches-rooted in dynamical systems and the theory of stochastic processes-have yielded insight into the dynamics of emerging and re-emerging pathogens. We argue that these approaches may lead to new methods for predicting epidemics. This perspective views pathogen emergence and re-emergence as a "critical transition," and uses the concept of noisy dynamic bifurcation to understand the relationship between the system observables and the distance to this transition. Because the system dynamics exhibit characteristic fluctuations in response to perturbations for a system in the vicinity of a critical point, we propose this information may be harnessed to develop early warning signals. Specifically, the motion of perturbations slows as the system approaches the transition.
Collapse
Affiliation(s)
- John M. Drake
- Odum School of Ecology, University of Georgia, Athens, Georgia, United States of America
- Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America
| | - Tobias S. Brett
- Odum School of Ecology, University of Georgia, Athens, Georgia, United States of America
- Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America
| | - Shiyang Chen
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Bogdan I. Epureanu
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, Michigan, United States of America
- Automotive Research Center, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Matthew J. Ferrari
- Center for Infectious Disease Dynamics, Pennsylvania State University, State College, Pennsylvania, United States of America
| | - Éric Marty
- Odum School of Ecology, University of Georgia, Athens, Georgia, United States of America
- Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America
| | - Paige B. Miller
- Odum School of Ecology, University of Georgia, Athens, Georgia, United States of America
- Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America
| | - Eamon B. O’Dea
- Odum School of Ecology, University of Georgia, Athens, Georgia, United States of America
- Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America
| | - Suzanne M. O’Regan
- Department of Mathematics, North Carolina A&T State University, Greensboro, North Carolina, United States of America
| | - Andrew W. Park
- Odum School of Ecology, University of Georgia, Athens, Georgia, United States of America
- Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America
| | - Pejman Rohani
- Odum School of Ecology, University of Georgia, Athens, Georgia, United States of America
- Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America
| |
Collapse
|
42
|
Reich NG, Brooks LC, Fox SJ, Kandula S, McGowan CJ, Moore E, Osthus D, Ray EL, Tushar A, Yamana TK, Biggerstaff M, Johansson MA, Rosenfeld R, Shaman J. A collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the United States. Proc Natl Acad Sci U S A 2019; 116:3146-3154. [PMID: 30647115 PMCID: PMC6386665 DOI: 10.1073/pnas.1812594116] [Citation(s) in RCA: 154] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Influenza infects an estimated 9-35 million individuals each year in the United States and is a contributing cause for between 12,000 and 56,000 deaths annually. Seasonal outbreaks of influenza are common in temperate regions of the world, with highest incidence typically occurring in colder and drier months of the year. Real-time forecasts of influenza transmission can inform public health response to outbreaks. We present the results of a multiinstitution collaborative effort to standardize the collection and evaluation of forecasting models for influenza in the United States for the 2010/2011 through 2016/2017 influenza seasons. For these seven seasons, we assembled weekly real-time forecasts of seven targets of public health interest from 22 different models. We compared forecast accuracy of each model relative to a historical baseline seasonal average. Across all regions of the United States, over half of the models showed consistently better performance than the historical baseline when forecasting incidence of influenza-like illness 1 wk, 2 wk, and 3 wk ahead of available data and when forecasting the timing and magnitude of the seasonal peak. In some regions, delays in data reporting were strongly and negatively associated with forecast accuracy. More timely reporting and an improved overall accessibility to novel and traditional data sources are needed to improve forecasting accuracy and its integration with real-time public health decision making.
Collapse
Affiliation(s)
- Nicholas G Reich
- Department of Biostatistics and Epidemiology, University of Massachusetts-Amherst, Amherst, MA 01003;
| | - Logan C Brooks
- Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, 15213
| | - Spencer J Fox
- Department of Integrative Biology, University of Texas at Austin, Austin, TX 78712
| | - Sasikiran Kandula
- Department of Environmental Health Sciences, Columbia University, New York, NY 10032
| | - Craig J McGowan
- Influenza Division, Centers for Disease Control and Prevention, Atlanta, GA 30333
| | - Evan Moore
- Department of Biostatistics and Epidemiology, University of Massachusetts-Amherst, Amherst, MA 01003
| | - Dave Osthus
- Statistical Sciences Group, Los Alamos National Laboratory, Los Alamos, NM 87545
| | - Evan L Ray
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA 01075
| | - Abhinav Tushar
- Department of Biostatistics and Epidemiology, University of Massachusetts-Amherst, Amherst, MA 01003
| | - Teresa K Yamana
- Department of Environmental Health Sciences, Columbia University, New York, NY 10032
| | - Matthew Biggerstaff
- Influenza Division, Centers for Disease Control and Prevention, Atlanta, GA 30333
| | - Michael A Johansson
- Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, San Juan, PR 00920
| | - Roni Rosenfeld
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Jeffrey Shaman
- Department of Environmental Health Sciences, Columbia University, New York, NY 10032
| |
Collapse
|
43
|
Osthus D, Daughton AR, Priedhorsky R. Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited. PLoS Comput Biol 2019; 15:e1006599. [PMID: 30707689 PMCID: PMC6373968 DOI: 10.1371/journal.pcbi.1006599] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 02/13/2019] [Accepted: 10/30/2018] [Indexed: 11/19/2022] Open
Abstract
The ability to produce timely and accurate flu forecasts in the United States can significantly impact public health. Augmenting forecasts with internet data has shown promise for improving forecast accuracy and timeliness in controlled settings, but results in practice are less convincing, as models augmented with internet data have not consistently outperformed models without internet data. In this paper, we perform a controlled experiment, taking into account data backfill, to improve clarity on the benefits and limitations of augmenting an already good flu forecasting model with internet-based nowcasts. Our results show that a good flu forecasting model can benefit from the augmentation of internet-based nowcasts in practice for all considered public health-relevant forecasting targets. The degree of forecast improvement due to nowcasting, however, is uneven across forecasting targets, with short-term forecasting targets seeing the largest improvements and seasonal targets such as the peak timing and intensity seeing relatively marginal improvements. The uneven forecasting improvements across targets hold even when "perfect" nowcasts are used. These findings suggest that further improvements to flu forecasting, particularly seasonal targets, will need to derive from other, non-nowcasting approaches.
Collapse
Affiliation(s)
- Dave Osthus
- Los Alamos National Laboratory, Los Alamos, New Mexico, USA
| | - Ashlynn R. Daughton
- Los Alamos National Laboratory, Los Alamos, New Mexico, USA
- University of Colorado Boulder, Boulder, Colorado, USA
| | | |
Collapse
|