1
|
Ray EL, Wang Y, Wolfinger RD, Reich NG. Flusion: Integrating multiple data sources for accurate influenza predictions. Epidemics 2025; 50:100810. [PMID: 39818098 DOI: 10.1016/j.epidem.2024.100810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 09/26/2024] [Accepted: 12/06/2024] [Indexed: 01/18/2025] Open
Abstract
Over the last ten years, the US Centers for Disease Control and Prevention (CDC) has organized an annual influenza forecasting challenge with the motivation that accurate probabilistic forecasts could improve situational awareness and yield more effective public health actions. Starting with the 2021/22 influenza season, the forecasting targets for this challenge have been based on hospital admissions reported in the CDC's National Healthcare Safety Network (NHSN) surveillance system. Reporting of influenza hospital admissions through NHSN began within the last few years, and as such only a limited amount of historical data are available for this target signal. To produce forecasts in the presence of limited data for the target surveillance system, we augmented these data with two signals that have a longer historical record: 1) ILI+, which estimates the proportion of outpatient doctor visits where the patient has influenza; and 2) rates of laboratory-confirmed influenza hospitalizations at a selected set of healthcare facilities. Our model, Flusion, is an ensemble model that combines two machine learning models using gradient boosting for quantile regression based on different feature sets with a Bayesian autoregressive model. The gradient boosting models were trained on all three data signals, while the autoregressive model was trained on only data for the target surveillance signal, NHSN admissions; all three models were trained jointly on data for multiple locations. In each week of the influenza season, these models produced quantiles of a predictive distribution of influenza hospital admissions in each state for the current week and the following three weeks; the ensemble prediction was computed by averaging these quantile predictions. Flusion emerged as the top-performing model in the CDC's influenza prediction challenge for the 2023/24 season. In this article we investigate the factors contributing to Flusion's success, and we find that its strong performance was primarily driven by the use of a gradient boosting model that was trained jointly on data from multiple surveillance signals and multiple locations. These results indicate the value of sharing information across multiple locations and surveillance signals, especially when doing so adds to the pool of available training data.
Collapse
Affiliation(s)
- Evan L Ray
- Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, United States.
| | - Yijin Wang
- Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, United States
| | | | - Nicholas G Reich
- Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, United States
| |
Collapse
|
2
|
Shen X, Rumack A, Wilder B, Tibshirani RJ. Nowcasting reported covid-19 hospitalizations using de-identified, aggregated medical insurance claims data. PLoS Comput Biol 2025; 21:e1012717. [PMID: 39965031 PMCID: PMC11841917 DOI: 10.1371/journal.pcbi.1012717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 02/20/2025] [Accepted: 12/12/2024] [Indexed: 02/20/2025] Open
Abstract
We propose, implement, and evaluate a method for nowcasting the daily number of new COVID-19 hospitalizations, at the level of individual US states, based on de-identified, aggregated medical insurance claims data. Our analysis proceeds under a hypothetical scenario in which, during the Delta wave, states only report data on the first day of each month, and on this day, report COVID-19 hospitalization counts for each day in the previous month. In this hypothetical scenario (just as in reality), medical insurance claims data continues to be available daily. At the beginning of each month, we train a regression model, using all data available thus far, to predict hospitalization counts from medical insurance claims. We then use this model to nowcast the (unseen) values of COVID-19 hospitalization counts from medical insurance claims, at each day in the following month. Our analysis uses properly-versioned data, which would have been available in real-time at the time predictions are produced (instead of using data that would have only been available in hindsight). In spite of the difficulties inherent to real-time estimation (e.g., latency and backfill) and the complex dynamics behind COVID-19 hospitalizations themselves, we find altogether that medical insurance claims can be an accurate predictor of hospitalization reports, with mean absolute errors typically around 0.4 hospitalizations per 100,000 people, i.e., proportion of variance explained around 75%. Perhaps more importantly, we find that nowcasts made using medical insurance claims are able to qualitatively capture the dynamics (upswings and downswings) of hospitalization waves, which are key features that inform public health decision-making.
Collapse
Affiliation(s)
- Xueda Shen
- Department of Biostatistics, University of California, Berkeley, California, United States of America
| | - Aaron Rumack
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Bryan Wilder
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Ryan J Tibshirani
- Department of Statistics, University of California, Berkeley, California, United States of America
| |
Collapse
|
3
|
Pekarek MJ, Weaver EA. Influenza B Virus Vaccine Innovation through Computational Design. Pathogens 2024; 13:755. [PMID: 39338946 PMCID: PMC11434669 DOI: 10.3390/pathogens13090755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 08/26/2024] [Accepted: 08/31/2024] [Indexed: 09/30/2024] Open
Abstract
As respiratory pathogens, influenza B viruses (IBVs) cause a significant socioeconomic burden each year. Vaccine and antiviral development for influenza viruses has historically viewed IBVs as a secondary concern to influenza A viruses (IAVs) due to their lack of animal reservoirs compared to IAVs. However, prior to the global spread of SARS-CoV-2, the seasonal epidemics caused by IBVs were becoming less predictable and inducing more severe disease, especially in high-risk populations. Globally, researchers have begun to recognize the need for improved prevention strategies for IBVs as a primary concern. This review discusses what is known about IBV evolutionary patterns and the effect of the spread of SARS-CoV-2 on these patterns. We also analyze recent advancements in the development of novel vaccines tested against IBVs, highlighting the promise of computational vaccine design strategies when used to target both IBVs and IAVs and explain why these novel strategies can be employed to improve the effectiveness of IBV vaccines.
Collapse
Affiliation(s)
| | - Eric A. Weaver
- Nebraska Center for Virology, School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE 68583, USA;
| |
Collapse
|
4
|
Chen Y, Hou W, Hou W, Dong J. Lagging effects and prediction of pollutants and their interaction modifiers on influenza in northeastern China. BMC Public Health 2023; 23:1826. [PMID: 37726705 PMCID: PMC10510220 DOI: 10.1186/s12889-023-16712-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 09/06/2023] [Indexed: 09/21/2023] Open
Abstract
BACKGROUND Previous studies have typically explored the daily lagged relations between influenza and meteorology, but few have explored seasonally the monthly lagged relationship, interaction and multiple prediction between influenza and pollution. Our specific objectives are to evaluate the lagged and interaction effects of pollution factors and construct models for estimating influenza incidence in a hierarchical manner. METHODS Our researchers collect influenza case data from 2005 to 2018 with meteorological and contaminative factors in Northeast China. We develop a generalized additive model with up to 6 months of maximum lag to analyze the impact of pollution factors on influenza cases and their interaction effects. We employ LASSO regression to identify the most significant environmental factors and conduct multiple complex regression analysis. In addition, quantile regression is taken to model the relation between influenza morbidity and specific percentiles (or quantiles) of meteorological factors. RESULTS The influenza epidemic in Northeast China has shown an upward trend year by year. The excessive incidence of influenza in Northeast China may be attributed to the suspected primary air pollutant, NO2, which has been observed to have overall low levels during January, March, and June. The Age 15-24 group shows an increase in the relative risk of influenza with an increase in PM2.5 concentration, with a lag of 0-6 months (ERR 1.08, 95% CI 0.10-2.07). In the quantitative analysis of the interaction model, PM10 at the level of 100-120 μg/m3, PM2.5 at the level of 60-80 μg/m3, and NO2 at the level of 60 μg/m3 or more have the greatest effect on the onset of influenza. The GPR model behaves better among prediction models. CONCLUSIONS Exposure to the air pollutant NO2 is associated with an increased risk of influenza with a cumulative lag effect. Prioritizing winter and spring pollution monitoring and influenza prediction modeling should be our focus.
Collapse
Affiliation(s)
- Ye Chen
- Department of Infectious Disease, Shenyang Center for Disease Control and Prevention, 110100, Shenyang, Liaoning Province, People's Republic of China
- Shenyang Natural Focal Diseases Clinical Medical Research Center, 110100, Shenyang, Liaoning Province, People's Republic of China
| | - Weiming Hou
- Department of Occupational and Environmental Health, School of Public Health, China Medical University, No.77 Puhe Road, 110122, Shenyang, People's Republic of China
| | - Weiyu Hou
- The First Hospital of Shanxi Medical University, No.85 Jiefang South Road, 030012, Taiyuan, People's Republic of China
| | - Jing Dong
- Department of Occupational and Environmental Health, School of Public Health, China Medical University, No.77 Puhe Road, 110122, Shenyang, People's Republic of China.
- Key Laboratory of Environmental Stress and Chronic Disease Control & Prevention (China Medical University), Ministry of Education, No.77 Puhe Road, 110122, Shenyang, People's Republic of China.
| |
Collapse
|
5
|
Jahja M, Chin A, Tibshirani RJ. Real-Time Estimation of COVID-19 Infections: Deconvolution and Sensor Fusion. Stat Sci 2022. [DOI: 10.1214/22-sts856] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Maria Jahja
- Maria Jahja is Ph.D. Candidate, Department of Statistics & Data Science, Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Andrew Chin
- Andrew Chin is Statistical Developer, Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Ryan J. Tibshirani
- Ryan J. Tibshirani is Professor, Department of Statistics & Data Science, Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
6
|
Epidemic tracking and forecasting: Lessons learned from a tumultuous year. Proc Natl Acad Sci U S A 2021; 118:2111456118. [PMID: 34903658 PMCID: PMC8713795 DOI: 10.1073/pnas.2111456118] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/23/2021] [Indexed: 01/15/2023] Open
|
7
|
Turtle J, Riley P, Ben-Nun M, Riley S. Accurate influenza forecasts using type-specific incidence data for small geographic units. PLoS Comput Biol 2021; 17:e1009230. [PMID: 34324487 PMCID: PMC8354478 DOI: 10.1371/journal.pcbi.1009230] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 08/10/2021] [Accepted: 06/30/2021] [Indexed: 11/24/2022] Open
Abstract
Influenza incidence forecasting is used to facilitate better health system planning and could potentially be used to allow at-risk individuals to modify their behavior during a severe seasonal influenza epidemic or a novel respiratory pandemic. For example, the US Centers for Disease Control and Prevention (CDC) runs an annual competition to forecast influenza-like illness (ILI) at the regional and national levels in the US, based on a standard discretized incidence scale. Here, we use a suite of forecasting models to analyze type-specific incidence at the smaller spatial scale of clusters of nearby counties. We used data from point-of-care (POC) diagnostic machines over three seasons, in 10 clusters, capturing: 57 counties; 1,061,891 total specimens; and 173,909 specimens positive for Influenza A. Total specimens were closely correlated with comparable CDC ILI data. Mechanistic models were substantially more accurate when forecasting influenza A positive POC data than total specimen POC data, especially at longer lead times. Also, models that fit subpopulations of the cluster (individual counties) separately were better able to forecast clusters than were models that directly fit to aggregated cluster data. Public health authorities may wish to consider developing forecasting pipelines for type-specific POC data in addition to ILI data. Simple mechanistic models will likely improve forecast accuracy when applied at small spatial scales to pathogen-specific data before being scaled to larger geographical units and broader syndromic data. Highly local forecasts may enable new public health messaging to encourage at-risk individuals to temporarily reduce their social mixing during seasonal peaks and guide public health intervention policy during potentially severe novel influenza pandemics.
Collapse
Affiliation(s)
- James Turtle
- Infectious Disease Group, Predictive Science Inc., San Diego, California, United States
- * E-mail:
| | - Pete Riley
- Infectious Disease Group, Predictive Science Inc., San Diego, California, United States
| | - Michal Ben-Nun
- Infectious Disease Group, Predictive Science Inc., San Diego, California, United States
| | - Steven Riley
- Infectious Disease Group, Predictive Science Inc., San Diego, California, United States
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| |
Collapse
|
8
|
Miliou I, Xiong X, Rinzivillo S, Zhang Q, Rossetti G, Giannotti F, Pedreschi D, Vespignani A. Predicting seasonal influenza using supermarket retail records. PLoS Comput Biol 2021; 17:e1009087. [PMID: 34252075 PMCID: PMC8297944 DOI: 10.1371/journal.pcbi.1009087] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 07/22/2021] [Accepted: 05/15/2021] [Indexed: 11/19/2022] Open
Abstract
Increased availability of epidemiological data, novel digital data streams, and the rise of powerful machine learning approaches have generated a surge of research activity on real-time epidemic forecast systems. In this paper, we propose the use of a novel data source, namely retail market data to improve seasonal influenza forecasting. Specifically, we consider supermarket retail data as a proxy signal for influenza, through the identification of sentinel baskets, i.e., products bought together by a population of selected customers. We develop a nowcasting and forecasting framework that provides estimates for influenza incidence in Italy up to 4 weeks ahead. We make use of the Support Vector Regression (SVR) model to produce the predictions of seasonal flu incidence. Our predictions outperform both a baseline autoregressive model and a second baseline based on product purchases. The results show quantitatively the value of incorporating retail market data in forecasting models, acting as a proxy that can be used for the real-time analysis of epidemics.
Collapse
Affiliation(s)
- Ioanna Miliou
- University of Pisa, Pisa, Italy
- ISTI-CNR, Pisa, Italy
| | - Xinyue Xiong
- Northeastern University, Boston, Massachusetts, United States of America
| | | | - Qian Zhang
- Northeastern University, Boston, Massachusetts, United States of America
| | | | | | | | | |
Collapse
|