1
|
Albrecht S, Broderick D, Dost K, Cheung I, Nghiem N, Wu M, Zhu J, Poonawala-Lohani N, Jamison S, Rasanathan D, Huang S, Trenholme A, Stanley A, Lawrence S, Marsh S, Castelino L, Paynter J, Turner N, McIntyre P, Riddle P, Grant C, Dobbie G, Wicker JS. Forecasting severe respiratory disease hospitalizations using machine learning algorithms. BMC Med Inform Decis Mak 2024; 24:293. [PMID: 39379946 PMCID: PMC11462891 DOI: 10.1186/s12911-024-02702-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Accepted: 09/30/2024] [Indexed: 10/10/2024] Open
Abstract
BACKGROUND Forecasting models predicting trends in hospitalization rates have the potential to inform hospital management during seasonal epidemics of respiratory diseases and the associated surges caused by acute hospital admissions. Hospital bed requirements for elective surgery could be better planned if it were possible to foresee upcoming peaks in severe respiratory illness admissions. Forecasting models can also guide the use of intervention strategies to decrease the spread of respiratory pathogens and thus prevent local health system overload. In this study, we explore the capability of forecasting models to predict the number of hospital admissions in Auckland, New Zealand, within a three-week time horizon. Furthermore, we evaluate probabilistic forecasts and the impact on model performance when integrating laboratory data describing the circulation of respiratory viruses. METHODS The dataset used for this exploration results from active hospital surveillance, in which the World Health Organization Severe Acute Respiratory Infection (SARI) case definition was consistently used. This research nurse-led surveillance has been implemented in two public hospitals in Auckland and provides a systematic laboratory testing of SARI patients for nine respiratory viruses, including influenza, respiratory syncytial virus, and rhinovirus. The forecasting strategies used comprise automatic machine learning, one of the most recent generative pre-trained transformers, and established artificial neural network algorithms capable of univariate and multivariate forecasting. RESULTS We found that machine learning models compute more accurate forecasts in comparison to naïve seasonal models. Furthermore, we analyzed the impact of reducing the temporal resolution of forecasts, which decreased the model error of point forecasts and made probabilistic forecasting more reliable. An additional analysis that used the laboratory data revealed strong season-to-season variations in the incidence of respiratory viruses and how this correlates with total hospitalization cases. These variations could explain why it was not possible to improve forecasts by integrating this data. CONCLUSIONS Active SARI surveillance and consistent data collection over time enable these data to be used to predict hospital bed utilization. These findings show the potential of machine learning as support for informing systems for proactive hospital management.
Collapse
Affiliation(s)
- Steffen Albrecht
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand.
| | - David Broderick
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand
| | - Katharina Dost
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand
| | - Isabella Cheung
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand
| | - Nhung Nghiem
- Australian National University, 131 Garran Rd, Acton, Canberra ACT, 2601, Australia
| | - Milton Wu
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand
| | - Johnny Zhu
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand
| | | | - Sarah Jamison
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand
| | | | - Sue Huang
- Institute of Environmental Science and Research, 34 Kenepuru Drive, Kenepuru, Porirua, 5022, New Zealand
| | - Adrian Trenholme
- Health New Zealand Counties Manukau, Middlemore Hospital, 100 Hospital Road, Auckland, 2025, New Zealand
| | - Alicia Stanley
- Health New Zealand Te Toka Tumai Auckland, Auckland City Hospital, 2 Park Road, Auckland, 1023, New Zealand
| | - Shirley Lawrence
- Health New Zealand Counties Manukau, Middlemore Hospital, 100 Hospital Road, Auckland, 2025, New Zealand
| | - Samantha Marsh
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand
| | | | - Janine Paynter
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand
| | - Nikki Turner
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand
| | - Peter McIntyre
- University of Otago, 362 Leith Street, Dunedin, 9016, New Zealand
| | - Patricia Riddle
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand
| | - Cameron Grant
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand.
| | - Gillian Dobbie
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand.
| | - Jörg Simon Wicker
- University of Auckland, 20 Symonds Street, Auckland, 1010, New Zealand.
| |
Collapse
|
2
|
Ning S, Hussain A, Wang Q. Incorporating connectivity among Internet search data for enhanced influenza-like illness tracking. PLoS One 2024; 19:e0305579. [PMID: 39186560 PMCID: PMC11346739 DOI: 10.1371/journal.pone.0305579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 06/02/2024] [Indexed: 08/28/2024] Open
Abstract
Big data collected from the Internet possess great potential to reveal the ever-changing trends in society. In particular, accurate infectious disease tracking with Internet data has grown in popularity, providing invaluable information for public health decision makers and the general public. However, much of the complex connectivity among the Internet search data is not effectively addressed among existing disease tracking frameworks. To this end, we propose ARGO-C (Augmented Regression with Clustered GOogle data), an integrative, statistically principled approach that incorporates the clustering structure of Internet search data to enhance the accuracy and interpretability of disease tracking. Focusing on multi-resolution %ILI (influenza-like illness) tracking, we demonstrate the improved performance and robustness of ARGO-C over benchmark methods at various geographical resolutions. We also highlight the adaptability of ARGO-C to track various diseases in addition to influenza, and to track other social or economic trends.
Collapse
Affiliation(s)
- Shaoyang Ning
- Department of Mathematics and Statistics, Williams College, Williamstown, MA, United States of America
| | - Ahmed Hussain
- Department of Mathematics and Statistics, Williams College, Williamstown, MA, United States of America
| | - Qing Wang
- Department of Mathematics, Wellesley College, Wellesley, MA, United States of America
| |
Collapse
|
3
|
Papagiannopoulou E, Bossa M, Deligiannis N, Sahli H. Long-Term Regional Influenza-Like-Illness Forecasting Using Exogenous Data. IEEE J Biomed Health Inform 2024; 28:3781-3792. [PMID: 38483802 DOI: 10.1109/jbhi.2024.3377529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
Disease forecasting is a longstanding problem for the research community, which aims at informing and improving decisions with the best available evidence. Specifically, the interest in respiratory disease forecasting has dramatically increased since the beginning of the coronavirus pandemic, rendering the accurate prediction of influenza-like-illness (ILI) a critical task. Although methods for short-term ILI forecasting and nowcasting have achieved good accuracy, their performance worsens at long-term ILI forecasts. Machine learning models have outperformed conventional forecasting approaches enabling to utilize diverse exogenous data sources, such as social media, internet users' search query logs, and climate data. However, the most recent deep learning ILI forecasting models use only historical occurrence data achieving state-of-the-art results. Inspired by recent deep neural network architectures in time series forecasting, this work proposes the Regional Influenza-Like-Illness Forecasting (ReILIF) method for regional long-term ILI prediction. The proposed architecture takes advantage of diverse exogenous data, that are, meteorological and population data, introducing an efficient intermediate fusion mechanism to combine the different types of information with the aim to capture the variations of ILI from various views. The efficacy of the proposed approach compared to state-of-the-art ILI forecasting methods is confirmed by an extensive experimental study following standard evaluation measures.
Collapse
|
4
|
Wang Y, Zhou H, Zheng L, Li M, Hu B. Using the Baidu index to predict trends in the incidence of tuberculosis in Jiangsu Province, China. Front Public Health 2023; 11:1203628. [PMID: 37533520 PMCID: PMC10390734 DOI: 10.3389/fpubh.2023.1203628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/05/2023] [Indexed: 08/04/2023] Open
Abstract
Objective To analyze the time series in the correlation between search terms related to tuberculosis (TB) and actual incidence data in China. To screen out the "leading" terms and construct a timely and efficient TB prediction model that can predict the next wave of TB epidemic trend in advance. Methods Monthly incidence data of tuberculosis in Jiangsu Province, China, were collected from January 2011 to December 2020. A scoping approach was used to identify TB search terms around common TB terms, prevention, symptoms and treatment. Search terms for Jiangsu Province, China, from January 2011 to December 2020 were collected from the Baidu index database. Correlation coefficients between search terms and actual incidence were calculated using Python 3.6 software. The multiple linear regression model was constructed using SPSS 26.0 software, which also calculated the goodness of fit and prediction error of the model predictions. Results A total of 16 keywords with correlation coefficients greater than 0.6 were screened, of which 11 were the leading terms. The R2 of the prediction model was 0.67 and the MAPE was 10.23%. Conclusion The TB prediction model based on Baidu Index data was able to predict the next wave of TB epidemic trends and intensity 2 months in advance. This forecasting model is currently only available for Jiangsu Province.
Collapse
|
5
|
Ma S, Ning S, Yang S. Joint COVID-19 and influenza-like illness forecasts in the United States using internet search information. COMMUNICATIONS MEDICINE 2023; 3:39. [PMID: 36964311 PMCID: PMC10038385 DOI: 10.1038/s43856-023-00272-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 03/09/2023] [Indexed: 03/26/2023] Open
Abstract
BACKGROUND As the prolonged COVID-19 pandemic continues, severe seasonal Influenza (flu) may happen alongside COVID-19. This could cause a "twindemic", in which there are additional burdens on health care resources and public safety compared to those occurring in the presence of a single infection. Amidst the raising trend of co-infections of the two diseases, forecasting both Influenza-like Illness (ILI) outbreaks and COVID-19 waves in a reliable and timely manner becomes more urgent than ever. Accurate and real-time joint prediction of the twindemic aids public health organizations and policymakers in adequate preparation and decision making. However, in the current pandemic, existing ILI and COVID-19 forecasting models face shortcomings under complex inter-disease dynamics, particularly due to the similarities in symptoms and healthcare-seeking patterns of the two diseases. METHODS Inspired by the interconnection between ILI and COVID-19 activities, we combine related internet search and bi-disease time series information for the U.S. national level and state level forecasts. Our proposed ARGOX-Joint-Ensemble adopts a new ensemble framework that integrates ILI and COVID-19 disease forecasting models to pool the information between the two diseases and provide joint multi-resolution and multi-target predictions. Through a winner-takes-all ensemble fashion, our framework is able to adaptively select the most predictive COVID-19 or ILI signals. RESULTS In the retrospective evaluation, our model steadily outperforms alternative benchmark methods, and remains competitive with other publicly available models in both point estimates and probabilistic predictions (including intervals). CONCLUSIONS The success of our approach illustrates that pooling information between the ILI and COVID-19 leads to improved forecasting models than individual models for either of the disease.
Collapse
Affiliation(s)
- Simin Ma
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Shaoyang Ning
- Department of Mathematics and Statistics, Williams College, Williamstown, MA, 01267, USA
| | - Shihao Yang
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
6
|
Poirier C, Bouzillé G, Bertaud V, Cuggia M, Santillana M, Lavenu A. Gastroenteritis Forecasting Assessing the Use of Web and Electronic Health Record Data With a Linear and a Nonlinear Approach: Comparison Study. JMIR Public Health Surveill 2023; 9:e34982. [PMID: 36719726 PMCID: PMC9929730 DOI: 10.2196/34982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 07/19/2022] [Accepted: 11/28/2022] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Disease surveillance systems capable of producing accurate real-time and short-term forecasts can help public health officials design timely public health interventions to mitigate the effects of disease outbreaks in affected populations. In France, existing clinic-based disease surveillance systems produce gastroenteritis activity information that lags real time by 1 to 3 weeks. This temporal data gap prevents public health officials from having a timely epidemiological characterization of this disease at any point in time and thus leads to the design of interventions that do not take into consideration the most recent changes in dynamics. OBJECTIVE The goal of this study was to evaluate the feasibility of using internet search query trends and electronic health records to predict acute gastroenteritis (AG) incidence rates in near real time, at the national and regional scales, and for long-term forecasts (up to 10 weeks). METHODS We present 2 different approaches (linear and nonlinear) that produce real-time estimates, short-term forecasts, and long-term forecasts of AG activity at 2 different spatial scales in France (national and regional). Both approaches leverage disparate data sources that include disease-related internet search activity, electronic health record data, and historical disease activity. RESULTS Our results suggest that all data sources contribute to improving gastroenteritis surveillance for long-term forecasts with the prominent predictive power of historical data owing to the strong seasonal dynamics of this disease. CONCLUSIONS The methods we developed could help reduce the impact of the AG peak by making it possible to anticipate increased activity by up to 10 weeks.
Collapse
Affiliation(s)
- Canelle Poirier
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Institut national de la santé et de la recherche médicale U1099, Rennes, France
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France
- Centre de Données Cliniques, Centre Hospitalier Universitaire Rennes, Rennes, France
| | - Guillaume Bouzillé
- Institut national de la santé et de la recherche médicale U1099, Rennes, France
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France
- Centre de Données Cliniques, Centre Hospitalier Universitaire Rennes, Rennes, France
| | - Valérie Bertaud
- Institut national de la santé et de la recherche médicale U1099, Rennes, France
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France
- Centre de Données Cliniques, Centre Hospitalier Universitaire Rennes, Rennes, France
| | - Marc Cuggia
- Institut national de la santé et de la recherche médicale U1099, Rennes, France
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France
- Centre de Données Cliniques, Centre Hospitalier Universitaire Rennes, Rennes, France
| | - Mauricio Santillana
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Harvard Tseng-Hsi Chan School of Public Health, Boston, MA, United States
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, United States
| | - Audrey Lavenu
- Faculté de médecine, Université de Rennes 1, Rennes, France
- Institut de Recherche Mathématique de Rennes, Rennes, France
- Institut national de la santé et de la recherche médicale CIC 1414, Université de Rennes 1, Rennes, France
| |
Collapse
|
7
|
Stolerman LM, Clemente L, Poirier C, Parag KV, Majumder A, Masyn S, Resch B, Santillana M. Using digital traces to build prospective and real-time county-level early warning systems to anticipate COVID-19 outbreaks in the United States. SCIENCE ADVANCES 2023; 9:eabq0199. [PMID: 36652520 PMCID: PMC9848273 DOI: 10.1126/sciadv.abq0199] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 12/19/2022] [Indexed: 06/17/2023]
Abstract
Coronavirus disease 2019 (COVID-19) continues to affect the world, and the design of strategies to curb disease outbreaks requires close monitoring of their trajectories. We present machine learning methods that leverage internet-based digital traces to anticipate sharp increases in COVID-19 activity in U.S. counties. In a complementary direction to the efforts led by the Centers for Disease Control and Prevention (CDC), our models are designed to detect the time when an uptrend in COVID-19 activity will occur. Motivated by the need for finer spatial resolution epidemiological insights, we build upon previous efforts conceived at the state level. Our methods-tested in an out-of-sample manner, as events were unfolding, in 97 counties representative of multiple population sizes across the United States-frequently anticipated increases in COVID-19 activity 1 to 6 weeks before local outbreaks, defined when the effective reproduction number Rt becomes larger than 1 for a period of 2 weeks.
Collapse
Affiliation(s)
- Lucas M. Stolerman
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Department of Mathematics, Oklahoma State University, Stillwater, OK, USA
| | - Leonardo Clemente
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA
| | - Canelle Poirier
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Kris V. Parag
- NIHR Health Protection Research Unit, Behavioural Science and Evaluation, University of Bristol, Bristol, UK
| | | | - Serge Masyn
- Global Public Health, Janssen R&D, Beerse, Belgium
| | - Bernd Resch
- Department of Geoinformatics - Z-GIS, University of Salzburg, Salzburg, Austria
- Center for Geographic Analysis, Harvard University, Cambridge, MA, USA
| | - Mauricio Santillana
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA
- Harvard University, T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
8
|
Cheatham S, Kummervold PE, Parisi L, Lanfranchi B, Croci I, Comunello F, Rota MC, Filia A, Tozzi AE, Rizzo C, Gesualdo F. Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model. Front Public Health 2022; 10:948880. [PMID: 35968436 PMCID: PMC9372360 DOI: 10.3389/fpubh.2022.948880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 07/11/2022] [Indexed: 11/13/2022] Open
Abstract
Social media is increasingly being used to express opinions and attitudes toward vaccines. The vaccine stance of social media posts can be classified in almost real-time using machine learning. We describe the use of a Transformer-based machine learning model for analyzing vaccine stance of Italian tweets, and demonstrate the need to address changes over time in vaccine-related language, through periodic model retraining. Vaccine-related tweets were collected through a platform developed for the European Joint Action on Vaccination. Two datasets were collected, the first between November 2019 and June 2020, the second from April to September 2021. The tweets were manually categorized by three independent annotators. After cleaning, the total dataset consisted of 1,736 tweets with 3 categories (promotional, neutral, and discouraging). The manually classified tweets were used to train and test various machine learning models. The model that classified the data most similarly to humans was XLM-Roberta-large, a multilingual version of the Transformer-based model RoBERTa. The model hyper-parameters were tuned and then the model ran five times. The fine-tuned model with the best F-score over the validation dataset was selected. Running the selected fine-tuned model on just the first test dataset resulted in an accuracy of 72.8% (F-score 0.713). Using this model on the second test dataset resulted in a 10% drop in accuracy to 62.1% (F-score 0.617), indicating that the model recognized a difference in language between the datasets. On the combined test datasets the accuracy was 70.1% (F-score 0.689). Retraining the model using data from the first and second datasets increased the accuracy over the second test dataset to 71.3% (F-score 0.713), a 9% improvement from when using just the first dataset for training. The accuracy over the first test dataset remained the same at 72.8% (F-score 0.721). The accuracy over the combined test datasets was then 72.4% (F-score 0.720), a 2% improvement. Through fine-tuning a machine-learning model on task-specific data, the accuracy achieved in categorizing tweets was close to that expected by a single human annotator. Regular training of machine-learning models with recent data is advisable to maximize accuracy.
Collapse
Affiliation(s)
- Susan Cheatham
- Multifactorial and Complex Diseases Research Area, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | | | - Lorenza Parisi
- Department of Human Sciences, Link Campus University, Rome, Italy
| | - Barbara Lanfranchi
- Multifactorial and Complex Diseases Research Area, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Ileana Croci
- Multifactorial and Complex Diseases Research Area, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Francesca Comunello
- Department of Communication and Social Research, Sapienza University, Rome, Italy
| | - Maria Cristina Rota
- Department of Infectious Diseases, Istituto Superiore di Sanità, Rome, Italy
| | - Antonietta Filia
- Department of Infectious Diseases, Istituto Superiore di Sanità, Rome, Italy
| | - Alberto Eugenio Tozzi
- Multifactorial and Complex Diseases Research Area, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Caterina Rizzo
- Multifactorial and Complex Diseases Research Area, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
- Department of Translational Research and New Technologies in Medicine and Surgery, Pisa University, Pisa, Italy
- *Correspondence: Caterina Rizzo
| | - Francesco Gesualdo
- Multifactorial and Complex Diseases Research Area, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| |
Collapse
|
9
|
Gunasekeran D, Chew AMK, Chandrasekar E, Rajendram P, Kandarpa V, Rajendram M, Chia A, Smith H, Leong CK. The impact and applications of social media platforms for public health responses before and during COVID-19. J Med Internet Res 2022; 24:e33680. [PMID: 35129456 PMCID: PMC9004624 DOI: 10.2196/33680] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 01/27/2022] [Accepted: 02/04/2022] [Indexed: 12/21/2022] Open
Abstract
Background Social media platforms have numerous potential benefits and drawbacks on public health, which have been described in the literature. The COVID-19 pandemic has exposed our limited knowledge regarding the potential health impact of these platforms, which have been detrimental to public health responses in many regions. Objective This review aims to highlight a brief history of social media in health care and report its potential negative and positive public health impacts, which have been characterized in the literature. Methods We searched electronic bibliographic databases including PubMed, including Medline and Institute of Electrical and Electronics Engineers Xplore, from December 10, 2015, to December 10, 2020. We screened the title and abstracts and selected relevant reports for review of full text and reference lists. These were analyzed thematically and consolidated into applications of social media platforms for public health. Results The positive and negative impact of social media platforms on public health are catalogued on the basis of recent research in this report. These findings are discussed in the context of improving future public health responses and incorporating other emerging digital technology domains such as artificial intelligence. However, there is a need for more research with pragmatic methodology that evaluates the impact of specific digital interventions to inform future health policy. Conclusions Recent research has highlighted the potential negative impact of social media platforms on population health, as well as potentially useful applications for public health communication, monitoring, and predictions. More research is needed to objectively investigate measures to mitigate against its negative impact while harnessing effective applications for the benefit of public health.
Collapse
Affiliation(s)
| | | | | | | | | | - Mallika Rajendram
- National University of Singapore (NUS), 10 Medical Drive, Singapore, SG
| | - Audrey Chia
- National University of Singapore (NUS), 10 Medical Drive, Singapore, SG
| | - Helen Smith
- Lee Kong Chian School of Medicine (LKCMedicine), Singapore, SG
| | - Choon Kit Leong
- National University of Singapore (NUS), 10 Medical Drive, Singapore, SG.,Mission Medical Clinic, Singapore, SG
| |
Collapse
|
10
|
Eum Y, Yoo EH. Using GPS-enabled mobile phones to evaluate the associations between human mobility changes and the onset of influenza illness. Spat Spatiotemporal Epidemiol 2022; 40:100458. [PMID: 35120680 PMCID: PMC8818086 DOI: 10.1016/j.sste.2021.100458] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Revised: 09/19/2021] [Accepted: 10/18/2021] [Indexed: 02/03/2023]
Abstract
Due to the challenges in data collection, there are few studies examining how individuals' routine mobility patterns change when they experience influenza-like symptoms (ILS). In the present study, we aimed to assess the association between changes in routine mobility and ILS using mobile phone-based GPS traces and self-reported surveys from 1,155 participants over the 2016-2017 influenza season. We used a set of mobility metrics to capture individuals' routine mobility patterns and matched their weekly ILS survey responses. For a statistical analysis, we used a time-stratified case-crossover analysis and conducted a stratified analysis to examine if such associations are moderated by demographic and socioeconomic factors, such as age, gender, occupational status, neighborhood poverty and education levels, and work type. We found that statistically significant associations existed between reduced routine mobility patterns and the experience of ILS. Results also indicated that the association between reduced mobility and ILS was significant only for female and for participants with high socioeconomic status. Our findings offered an improved understanding of ILS-associated mobility changes at the individual level and suggest the potential of individual mobility data for influenza surveillance.
Collapse
Affiliation(s)
- Youngseob Eum
- Department of Geography, State University of New York at Buffalo, Buffalo, NY, USA.
| | - Eun-Hye Yoo
- Department of Geography, State University of New York at Buffalo, Buffalo, NY, USA.
| |
Collapse
|
11
|
He Y, Zhao Y, Chen Y, Yuan H, Tsui K. Nowcasting influenza‐like illness (ILI) via a deep learning approach using google search data: An empirical study on Taiwan ILI. INT J INTELL SYST 2021. [DOI: 10.1002/int.22788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Yuxin He
- College of Urban Transportation and Logistics Shenzhen Technology University Shenzhen China
| | - Yang Zhao
- School of Public Health (Shenzhen) Sun Yat‐Sen University Guangzhou China
| | - Yupeng Chen
- Trial Retail Engineering (T. R. E. China) Yantai China
| | - Hsiang‐Yu Yuan
- Department of Biomedical Sciences City University of Hong Kong Hong Kong China
| | - Kwok‐Leung Tsui
- Department of Industrial and Systems Engineering Virginia Polytechnic Institute and State University Blacksburg Virginia USA
| |
Collapse
|
12
|
Abstract
Influenza is a common respiratory infection that causes considerable morbidity and mortality worldwide each year. In recent years, along with the improvement in computational resources, there have been a number of important developments in the science of influenza surveillance and forecasting. Influenza surveillance systems have been improved by synthesizing multiple sources of information. Influenza forecasting has developed into an active field, with annual challenges in the United States that have stimulated improved methodologies. Work continues on the optimal approaches to assimilating surveillance data and information on relevant driving factors to improve estimates of the current situation (nowcasting) and to forecast future dynamics.
Collapse
Affiliation(s)
- Sheikh Taslim Ali
- World Health Organization Collaborating Centre for Infectious Disease Epidemiology and Control, School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong Special Administrative Region, China;
| | - Benjamin J Cowling
- World Health Organization Collaborating Centre for Infectious Disease Epidemiology and Control, School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong Special Administrative Region, China;
| |
Collapse
|
13
|
Yang S, Bao Y. Comprehensive learning particle swarm optimization enabled modeling framework for multi-step-ahead influenza prediction. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107994] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
14
|
Rahman A, Jiang D. Regional and temporal patterns of influenza: Application of functional data analysis. Infect Dis Model 2021; 6:1061-1072. [PMID: 34541424 PMCID: PMC8433253 DOI: 10.1016/j.idm.2021.08.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 08/26/2021] [Accepted: 08/26/2021] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND The accurate estimation of temporal patterns of influenza may help in utilizing hospital resources and guiding influenza surveillance. This paper proposes functional data analysis (FDA) to improve the prediction of temporal patterns of influenza. METHODS We illustrate FDA methods using the weekly Influenza-like Illness (ILI) activity level data from the U.S. We propose to use the Fourier basis function for transforming discrete weekly data to the smoothed functional ILI activities. Functional analysis of variance (FANOVA) is used to examine the regional differences in temporal patterns and the impact of state's political orientation. RESULTS The ILI activity has a very distinct peak at the beginning and end of the year. There are significant differences in average level of ILI activities among geographic regions. However, the temporal patterns in terms of the peak and flat time are quite consistent across regions. The geographic and temporal patterns of ILI activities also depend on the political make-up of states. The states affiliated with Republicans had higher ILI activities than those affiliated with Democrats across the whole year. The influence of political party affiliation on temporal pattern is quite different among geographic regions. CONCLUSIONS Functional data analysis can help us to reveal the temporal variability in average ILI levels, rate of change in ILI levels, and the effect of geographical regions. Consideration should be given to wider application of FDA to generate more accurate estimates in public health and biomedical research.
Collapse
Affiliation(s)
- Azizur Rahman
- Department of Community Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
- Department of Statistics, Jahangirnagar University, Savar, Dhaka, Bangladesh
| | - Depeng Jiang
- Department of Community Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
- School of Sciences, Nanjing Forest University, Nanjing, Jiangsu, China
| |
Collapse
|
15
|
Lee K, Ray J, Safta C. The predictive skill of convolutional neural networks models for disease forecasting. PLoS One 2021; 16:e0254319. [PMID: 34242349 PMCID: PMC8270135 DOI: 10.1371/journal.pone.0254319] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 06/24/2021] [Indexed: 11/18/2022] Open
Abstract
In this paper we investigate the utility of one-dimensional convolutional neural network (CNN) models in epidemiological forecasting. Deep learning models, in particular variants of recurrent neural networks (RNNs) have been studied for ILI (Influenza-Like Illness) forecasting, and have achieved a higher forecasting skill compared to conventional models such as ARIMA. In this study, we adapt two neural networks that employ one-dimensional temporal convolutional layers as a primary building block-temporal convolutional networks and simple neural attentive meta-learners-for epidemiological forecasting. We then test them with influenza data from the US collected over 2010-2019. We find that epidemiological forecasting with CNNs is feasible, and their forecasting skill is comparable to, and at times, superior to, plain RNNs. Thus CNNs and RNNs bring the power of nonlinear transformations to purely data-driven epidemiological models, a capability that heretofore has been limited to more elaborate mechanistic/compartmental disease models.
Collapse
Affiliation(s)
- Kookjin Lee
- Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, AZ, United States of America
- Extreme-Scale Data Science and Analytics, Sandia National Laboratories, Livermore, CA, United States of America
| | - Jaideep Ray
- Extreme-Scale Data Science and Analytics, Sandia National Laboratories, Livermore, CA, United States of America
| | - Cosmin Safta
- Quantitative Modeling and Analysis, Sandia National Laboratories, Livermore, CA, United States of America
| |
Collapse
|
16
|
Miliou I, Xiong X, Rinzivillo S, Zhang Q, Rossetti G, Giannotti F, Pedreschi D, Vespignani A. Predicting seasonal influenza using supermarket retail records. PLoS Comput Biol 2021; 17:e1009087. [PMID: 34252075 PMCID: PMC8297944 DOI: 10.1371/journal.pcbi.1009087] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 07/22/2021] [Accepted: 05/15/2021] [Indexed: 11/19/2022] Open
Abstract
Increased availability of epidemiological data, novel digital data streams, and the rise of powerful machine learning approaches have generated a surge of research activity on real-time epidemic forecast systems. In this paper, we propose the use of a novel data source, namely retail market data to improve seasonal influenza forecasting. Specifically, we consider supermarket retail data as a proxy signal for influenza, through the identification of sentinel baskets, i.e., products bought together by a population of selected customers. We develop a nowcasting and forecasting framework that provides estimates for influenza incidence in Italy up to 4 weeks ahead. We make use of the Support Vector Regression (SVR) model to produce the predictions of seasonal flu incidence. Our predictions outperform both a baseline autoregressive model and a second baseline based on product purchases. The results show quantitatively the value of incorporating retail market data in forecasting models, acting as a proxy that can be used for the real-time analysis of epidemics.
Collapse
Affiliation(s)
- Ioanna Miliou
- University of Pisa, Pisa, Italy
- ISTI-CNR, Pisa, Italy
| | - Xinyue Xiong
- Northeastern University, Boston, Massachusetts, United States of America
| | | | - Qian Zhang
- Northeastern University, Boston, Massachusetts, United States of America
| | | | | | | | | |
Collapse
|
17
|
Teo JTH, Dinu V, Bernal W, Davidson P, Oliynyk V, Breen C, Barker RD, Dobson RJB. Real-time clinician text feeds from electronic health records. NPJ Digit Med 2021; 4:35. [PMID: 33627748 PMCID: PMC7904856 DOI: 10.1038/s41746-021-00406-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 01/26/2021] [Indexed: 11/09/2022] Open
Abstract
Analyses of search engine and social media feeds have been attempted for infectious disease outbreaks, but have been found to be susceptible to artefactual distortions from health scares or keyword spamming in social media or the public internet. We describe an approach using real-time aggregation of keywords and phrases of freetext from real-time clinician-generated documentation in electronic health records to produce a customisable real-time viral pneumonia signal providing up to 4 days warning for secondary care capacity planning. This low-cost approach is open-source, is locally customisable, is not dependent on any specific electronic health record system and can provide an ensemble of signals if deployed at multiple organisational scales.
Collapse
Affiliation(s)
- James T H Teo
- Kings College Hospital NHS Foundation Trust, London, United Kingdom.
- Guys & St Thomas Hospital NHS Foundation Trust, London, United Kingdom.
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom.
| | - Vlad Dinu
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
| | - William Bernal
- Kings College Hospital NHS Foundation Trust, London, United Kingdom
| | - Phil Davidson
- Kings College Hospital NHS Foundation Trust, London, United Kingdom
| | - Vitaliy Oliynyk
- Guys & St Thomas Hospital NHS Foundation Trust, London, United Kingdom
| | - Cormac Breen
- Guys & St Thomas Hospital NHS Foundation Trust, London, United Kingdom
| | - Richard D Barker
- Kings College Hospital NHS Foundation Trust, London, United Kingdom
| | - Richard J B Dobson
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
| |
Collapse
|
18
|
Use Internet search data to accurately track state level influenza epidemics. Sci Rep 2021; 11:4023. [PMID: 33597556 PMCID: PMC7889878 DOI: 10.1038/s41598-021-83084-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 01/28/2021] [Indexed: 11/22/2022] Open
Abstract
For epidemics control and prevention, timely insights of potential hot spots are invaluable. Alternative to traditional epidemic surveillance, which often lags behind real time by weeks, big data from the Internet provide important information of the current epidemic trends. Here we present a methodology, ARGOX (Augmented Regression with GOogle data CROSS space), for accurate real-time tracking of state-level influenza epidemics in the United States. ARGOX combines Internet search data at the national, regional and state levels with traditional influenza surveillance data from the Centers for Disease Control and Prevention, and accounts for both the spatial correlation structure of state-level influenza activities and the evolution of people’s Internet search pattern. ARGOX achieves on average 28% error reduction over the best alternative for real-time state-level influenza estimation for 2014 to 2020. ARGOX is robust and reliable and can be potentially applied to track county- and city-level influenza activity and other infectious diseases.
Collapse
|
19
|
Mehrmolaei S. EPTs-TL: A two-level approach for efficient event prediction in healthcare. Artif Intell Med 2020; 111:101999. [PMID: 33461692 DOI: 10.1016/j.artmed.2020.101999] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2018] [Revised: 11/23/2020] [Accepted: 11/24/2020] [Indexed: 11/18/2022]
Abstract
Recently, the event prediction on time series (EPTs) was discussed as one of the important and interesting research trends that its usage is growing for taking proper decisions in the various sciences. In the real-world, time series event-based analysis can pose as one of the challenging prediction problems in healthcare, which have a direct impact and a key role in supporting health management. In this paper, an efficient approach of two-level (TL) is proposed to the EPTs problem in healthcare, which named EPTs-TL. At the first level, unseen time series data is predicted by using an enhanced hybrid model based on soft computing technology. Then, a new feature extraction-based method is proposed for fuzzy detection of future events in two-level. The EPTs -TL approach employed concepts of three components: weighting, fuzzy logic, and metaheuristics in two-level of the proposed approach. The empirical results demonstrate the excellent performance of the EPTs -TL approach in comparison to conventional prediction models in healthcare and medicine. Also, the proposed approach can be introduced as a strong tool to handle the complex and uncertain behaviors of time series, analyze unusual variations of those, forewarn the possible critical situations in the society, and fuzzy predict event in healthcare.
Collapse
Affiliation(s)
- Soheila Mehrmolaei
- Data Mining Lab, Department of Computer Engineering, Faculty of Engineering, Alzahra University, Tehran, Iran.
| |
Collapse
|
20
|
Leuba SI, Yaesoubi R, Antillon M, Cohen T, Zimmer C. Tracking and predicting U.S. influenza activity with a real-time surveillance network. PLoS Comput Biol 2020; 16:e1008180. [PMID: 33137088 PMCID: PMC7707518 DOI: 10.1371/journal.pcbi.1008180] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 12/01/2020] [Accepted: 07/22/2020] [Indexed: 12/29/2022] Open
Abstract
Each year in the United States, influenza causes illness in 9.2 to 35.6 million individuals and is responsible for 12,000 to 56,000 deaths. The U.S. Centers for Disease Control and Prevention (CDC) tracks influenza activity through a national surveillance network. These data are only available after a delay of 1 to 2 weeks, and thus influenza epidemiologists and transmission modelers have explored the use of other data sources to produce more timely estimates and predictions of influenza activity. We evaluated whether data collected from a national commercial network of influenza diagnostic machines could produce valid estimates of the current burden and help to predict influenza trends in the United States. Quidel Corporation provided us with de-identified influenza test results transmitted in real-time from a national network of influenza test machines called the Influenza Test System (ITS). We used this ITS dataset to estimate and predict influenza-like illness (ILI) activity in the United States over the 2015-2016 and 2016-2017 influenza seasons. First, we developed linear logistic models on national and regional geographic scales that accurately estimated two CDC influenza metrics: the proportion of influenza test results that are positive and the proportion of physician visits that are ILI-related. We then used our estimated ILI-related proportion of physician visits in transmission models to produce improved predictions of influenza trends in the United States at both the regional and national scale. These findings suggest that ITS can be leveraged to improve "nowcasts" and short-term forecasts of U.S. influenza activity.
Collapse
Affiliation(s)
- Sequoia I. Leuba
- Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| | - Reza Yaesoubi
- Health Policy and Management, Yale School of Public Health, New Haven, CT, USA
| | - Marina Antillon
- Household Economics and Health Systems Research Unit, Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Ted Cohen
- Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| | - Christoph Zimmer
- Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
21
|
Cheng HY, Wu YC, Lin MH, Liu YL, Tsai YY, Wu JH, Pan KH, Ke CJ, Chen CM, Liu DP, Lin IF, Chuang JH. Applying Machine Learning Models with An Ensemble Approach for Accurate Real-Time Influenza Forecasting in Taiwan: Development and Validation Study. J Med Internet Res 2020; 22:e15394. [PMID: 32755888 PMCID: PMC7439145 DOI: 10.2196/15394] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 12/21/2019] [Accepted: 06/13/2020] [Indexed: 12/14/2022] Open
Abstract
Background Changeful seasonal influenza activity in subtropical areas such as Taiwan causes problems in epidemic preparedness. The Taiwan Centers for Disease Control has maintained real-time national influenza surveillance systems since 2004. Except for timely monitoring, epidemic forecasting using the national influenza surveillance data can provide pivotal information for public health response. Objective We aimed to develop predictive models using machine learning to provide real-time influenza-like illness forecasts. Methods Using surveillance data of influenza-like illness visits from emergency departments (from the Real-Time Outbreak and Disease Surveillance System), outpatient departments (from the National Health Insurance database), and the records of patients with severe influenza with complications (from the National Notifiable Disease Surveillance System), we developed 4 machine learning models (autoregressive integrated moving average, random forest, support vector regression, and extreme gradient boosting) to produce weekly influenza-like illness predictions for a given week and 3 subsequent weeks. We established a framework of the machine learning models and used an ensemble approach called stacking to integrate these predictions. We trained the models using historical data from 2008-2014. We evaluated their predictive ability during 2015-2017 for each of the 4-week time periods using Pearson correlation, mean absolute percentage error (MAPE), and hit rate of trend prediction. A dashboard website was built to visualize the forecasts, and the results of real-world implementation of this forecasting framework in 2018 were evaluated using the same metrics. Results All models could accurately predict the timing and magnitudes of the seasonal peaks in the then-current week (nowcast) (ρ=0.802-0.965; MAPE: 5.2%-9.2%; hit rate: 0.577-0.756), 1-week (ρ=0.803-0.918; MAPE: 8.3%-11.8%; hit rate: 0.643-0.747), 2-week (ρ=0.783-0.867; MAPE: 10.1%-15.3%; hit rate: 0.669-0.734), and 3-week forecasts (ρ=0.676-0.801; MAPE: 12.0%-18.9%; hit rate: 0.643-0.786), especially the ensemble model. In real-world implementation in 2018, the forecasting performance was still accurate in nowcasts (ρ=0.875-0.969; MAPE: 5.3%-8.0%; hit rate: 0.582-0.782) and remained satisfactory in 3-week forecasts (ρ=0.721-0.908; MAPE: 7.6%-13.5%; hit rate: 0.596-0.904). Conclusions This machine learning and ensemble approach can make accurate, real-time influenza-like illness forecasts for a 4-week period, and thus, facilitate decision making.
Collapse
Affiliation(s)
| | | | - Min-Hau Lin
- Taiwan Centers for Disease Control, Taipei, Taiwan
| | - Yu-Lun Liu
- Taiwan Centers for Disease Control, Taipei, Taiwan
| | | | - Jo-Hua Wu
- Value Lab, Acer Inc., Taipei, Taiwan
| | | | - Chih-Jung Ke
- Taiwan Centers for Disease Control, Taipei, Taiwan
| | | | - Ding-Ping Liu
- Taiwan Centers for Disease Control, Taipei, Taiwan.,National Taipei University of Nursing and Health Sciences, Taipei, Taiwan
| | - I-Feng Lin
- Institute of Public Health, National Yang-Ming University, Taipei, Taiwan
| | - Jen-Hsiang Chuang
- Taiwan Centers for Disease Control, Taipei, Taiwan.,Institute of Public Health, National Yang-Ming University, Taipei, Taiwan
| |
Collapse
|
22
|
Aiken EL, McGough SF, Majumder MS, Wachtel G, Nguyen AT, Viboud C, Santillana M. Real-time estimation of disease activity in emerging outbreaks using internet search information. PLoS Comput Biol 2020; 16:e1008117. [PMID: 32804932 PMCID: PMC7451983 DOI: 10.1371/journal.pcbi.1008117] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 08/27/2020] [Accepted: 07/01/2020] [Indexed: 11/18/2022] Open
Abstract
Understanding the behavior of emerging disease outbreaks in, or ahead of, real-time could help healthcare officials better design interventions to mitigate impacts on affected populations. Most healthcare-based disease surveillance systems, however, have significant inherent reporting delays due to data collection, aggregation, and distribution processes. Recent work has shown that machine learning methods leveraging a combination of traditionally collected epidemiological information and novel Internet-based data sources, such as disease-related Internet search activity, can produce meaningful "nowcasts" of disease incidence ahead of healthcare-based estimates, with most successful case studies focusing on endemic and seasonal diseases such as influenza and dengue. Here, we apply similar computational methods to emerging outbreaks in geographic regions where no historical presence of the disease of interest has been observed. By combining limited available historical epidemiological data available with disease-related Internet search activity, we retrospectively estimate disease activity in five recent outbreaks weeks ahead of traditional surveillance methods. We find that the proposed computational methods frequently provide useful real-time incidence estimates that can help fill temporal data gaps resulting from surveillance reporting delays. However, the proposed methods are limited by issues of sample bias and skew in search query volumes, perhaps as a result of media coverage.
Collapse
Affiliation(s)
- Emily L. Aiken
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sarah F. McGough
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Maimuna S. Majumder
- Department of Healthcare Policy, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Gal Wachtel
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
| | - Andre T. Nguyen
- Booz Allen Hamilton, Columbia, Maryland, United States of America
- University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
| | - Cecile Viboud
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Mauricio Santillana
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
23
|
Al-qaness MAA, Ewees AA, Fan H, Abd Elaziz M. Optimized Forecasting Method for Weekly Influenza Confirmed Cases. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E3510. [PMID: 32443409 PMCID: PMC7277888 DOI: 10.3390/ijerph17103510] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 05/08/2020] [Accepted: 05/12/2020] [Indexed: 11/16/2022]
Abstract
Influenza epidemic is a serious threat to the entire world, which causes thousands of death every year and can be considered as a public health emergency that needs to be more addressed and investigated. Forecasting influenza incidences or confirmed cases is very important to do the necessary policies and plans for governments and health organizations. In this paper, we present an enhanced adaptive neuro-fuzzy inference system (ANFIS) to forecast the weekly confirmed influenza cases in China and the USA using official datasets. To overcome the limitations of the original ANFIS, we use two metaheuristics, called flower pollination algorithm (FPA) and sine cosine algorithm (SCA), to enhance the prediction of the ANFIS. The proposed FPASCA-ANFIS is evaluated using two datasets collected from the CDC and WHO websites. Furthermore, it was compared to some previous state-of-the-art approaches. Experimental results confirmed that the FPASCA-ANFIS outperformed the compared methods using variant measures, including RMSRE, MAPE, MAE, and R 2 .
Collapse
Affiliation(s)
- Mohammed A. A. Al-qaness
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China;
| | - Ahmed A. Ewees
- Department of e-Systems, University of Bisha, Bisha 61922, Saudi Arabia;
- Department of Computer, Damietta University, Damietta 34517, Egypt
| | - Hong Fan
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China;
| | - Mohamed Abd Elaziz
- Department of Mathematics, Faculty of Science, Zagazig University, Zagazig 44519, Egypt;
| |
Collapse
|
24
|
Beesley LJ, Salvatore M, Fritsche LG, Pandit A, Rao A, Brummett C, Willer CJ, Lisabeth LD, Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat Med 2020; 39:773-800. [PMID: 31859414 PMCID: PMC7983809 DOI: 10.1002/sim.8445] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 09/10/2019] [Accepted: 11/16/2019] [Indexed: 01/03/2023]
Abstract
Biobanks linked to electronic health records provide rich resources for health-related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis-generating studies of disease-treatment, disease-exposure, and disease-gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank-based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank-based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.
Collapse
Affiliation(s)
| | | | | | - Anita Pandit
- University of Michigan, Department of Biostatistics
| | - Arvind Rao
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | - Chad Brummett
- University of Michigan, Department of Anesthesiology
| | - Cristen J. Willer
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | | | | |
Collapse
|
25
|
Barros JM, Duggan J, Rebholz-Schuhmann D. The Application of Internet-Based Sources for Public Health Surveillance (Infoveillance): Systematic Review. J Med Internet Res 2020; 22:e13680. [PMID: 32167477 PMCID: PMC7101503 DOI: 10.2196/13680] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Revised: 09/18/2019] [Accepted: 11/26/2019] [Indexed: 12/30/2022] Open
Abstract
Background Public health surveillance is based on the continuous and systematic collection, analysis, and interpretation of data. This informs the development of early warning systems to monitor epidemics and documents the impact of intervention measures. The introduction of digital data sources, and specifically sources available on the internet, has impacted the field of public health surveillance. New opportunities enabled by the underlying availability and scale of internet-based sources (IBSs) have paved the way for novel approaches for disease surveillance, exploration of health communities, and the study of epidemic dynamics. This field and approach is also known as infodemiology or infoveillance. Objective This review aimed to assess research findings regarding the application of IBSs for public health surveillance (infodemiology or infoveillance). To achieve this, we have presented a comprehensive systematic literature review with a focus on these sources and their limitations, the diseases targeted, and commonly applied methods. Methods A systematic literature review was conducted targeting publications between 2012 and 2018 that leveraged IBSs for public health surveillance, outbreak forecasting, disease characterization, diagnosis prediction, content analysis, and health-topic identification. The search results were filtered according to previously defined inclusion and exclusion criteria. Results Spanning a total of 162 publications, we determined infectious diseases to be the preferred case study (108/162, 66.7%). Of the eight categories of IBSs (search queries, social media, news, discussion forums, websites, web encyclopedia, and online obituaries), search queries and social media were applied in 95.1% (154/162) of the reviewed publications. We also identified limitations in representativeness and biased user age groups, as well as high susceptibility to media events by search queries, social media, and web encyclopedias. Conclusions IBSs are a valuable proxy to study illnesses affecting the general population; however, it is important to characterize which diseases are best suited for the available sources; the literature shows that the level of engagement among online platforms can be a potential indicator. There is a necessity to understand the population’s online behavior; in addition, the exploration of health information dissemination and its content is significantly unexplored. With this information, we can understand how the population communicates about illnesses online and, in the process, benefit public health.
Collapse
Affiliation(s)
- Joana M Barros
- Insight Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland.,School of Computer Science, National University of Ireland Galway, Galway, Ireland
| | - Jim Duggan
- School of Computer Science, National University of Ireland Galway, Galway, Ireland
| | | |
Collapse
|
26
|
Sadilek A, Hswen Y, Bavadekar S, Shekel T, Brownstein JS, Gabrilovich E. Lymelight: forecasting Lyme disease risk using web search data. NPJ Digit Med 2020; 3:16. [PMID: 32047861 PMCID: PMC7000681 DOI: 10.1038/s41746-020-0222-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 12/19/2019] [Indexed: 02/02/2023] Open
Abstract
Lyme disease is the most common tick-borne disease in the Northern Hemisphere. Existing estimates of Lyme disease spread are delayed a year or more. We introduce Lymelight-a new method for monitoring the incidence of Lyme disease in real-time. We use a machine-learned classifier of web search sessions to estimate the number of individuals who search for possible Lyme disease symptoms in a given geographical area for two years, 2014 and 2015. We evaluate Lymelight using the official case count data from CDC and find a 92% correlation (p < 0.001) at county level. Importantly, using web search data allows us not only to assess the incidence of the disease, but also to examine the appropriateness of treatments subsequently searched for by the users. Public health implications of our work include monitoring the spread of vector-borne diseases in a timely and scalable manner, complementing existing approaches through real-time detection, which can enable more timely interventions. Our analysis of treatment searches may also help reduce misdiagnosis of the disease.
Collapse
Affiliation(s)
| | - Yulin Hswen
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA USA
- Computational Epidemiology Lab, Boston Children’s Hospital, Boston, MA USA
| | | | | | - John S. Brownstein
- Computational Epidemiology Lab, Boston Children’s Hospital, Boston, MA USA
- Department of Pediatrics, Harvard Medical School, Massachusetts, USA
| | | |
Collapse
|
27
|
Aiello AE, Renson A, Zivich PN. Social Media- and Internet-Based Disease Surveillance for Public Health. Annu Rev Public Health 2020; 41:101-118. [PMID: 31905322 DOI: 10.1146/annurev-publhealth-040119-094402] [Citation(s) in RCA: 135] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Disease surveillance systems are a cornerstone of public health tracking and prevention. This review addresses the use, promise, perils, and ethics of social media- and Internet-based data collection for public health surveillance. Our review highlights untapped opportunities for integrating digital surveillance in public health and current applications that could be improved through better integration, validation, and clarity on rules surrounding ethical considerations. Promising developments include hybrid systems that couple traditional surveillance data with data from search queries, social media posts, and crowdsourcing. In the future, it will be important to identify opportunities for public and private partnerships, train public health experts in data science, reduce biases related to digital data (gathered from Internet use, wearable devices, etc.), and address privacy. We are on the precipice of an unprecedented opportunity to track, predict, and prevent global disease burdens in the population using digital data.
Collapse
Affiliation(s)
- Allison E Aiello
- Department of Epidemiology, Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7435, USA; , ,
| | - Audrey Renson
- Department of Epidemiology, Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7435, USA; , ,
| | - Paul N Zivich
- Department of Epidemiology, Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7435, USA; , ,
| |
Collapse
|
28
|
Samaras L, García-Barriocanal E, Sicilia MA. Syndromic surveillance using web data: a systematic review. INNOVATION IN HEALTH INFORMATICS 2020. [PMCID: PMC7153324 DOI: 10.1016/b978-0-12-819043-2.00002-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
During the recent years, a lot of debate is taken place about the evolution of Smart Healthcare systems. Particularly, how these systems can help people improve human conditions of health, by taking advantages of the new Information and Communication Technologies (ICT), regarding early prediction and efficient treatment. The purpose of this study is to provide a systematic review of the current literature available that focuses on information systems on syndromic surveillance using web data. All published items concern articles, books, reviews, reports, conference announcements, and dissertations. We used a variation of PRISMA Statements methodology to conduct a systematic review. The review identifies the relevant published papers from the year 2004 to 2018, systematically includes and explores them to extract similarities, gaps, and conclusions on the research that has been done so far. The results presented concern the year, the examined disease, the web data source, the geographic location/country, and the data analysis method used. The results show that influenza is the most examined infectious disease. The internet tools most used are Twitter and Google. Regarding the geographical areas explored in the published papers, the most examined country is the United States, since many scientists come from this country. There is a significant growth of articles since 2009. There are also various statistical methods used to correlate the data retrieved from the internet to the data from national authorities. The conclusion of all researches is that the Web can be a useful tool for the detection of serious epidemics and for a creation of a syndromic surveillance system using the Web, since we can predict epidemics from web data before they are officially detected in population. With the advance of ICT, Smart Healthcare can benefit from the monitoring of epidemics and the early prediction of such a system, improving national or international health strategies and policy decision. This can be achieved through the provision of new technology tools to enhance health monitoring systems toward the new innovations of Smart Health or eHealth, even with the emerging technologies of Internet of Things. The challenges and impacts of an electronic system based on internet data include the social, medical, and technological disciplines. These can be further extended to Smart Healthcare, as the data streaming can provide with real-time information, awareness on epidemics and alerts for both patients or medical scientists. Finally, these new systems can help improve the standards of human life.
Collapse
|
29
|
Tideman S, Santillana M, Bickel J, Reis B. Internet search query data improve forecasts of daily emergency department volume. J Am Med Inform Assoc 2019; 26:1574-1583. [PMID: 31730701 PMCID: PMC7647136 DOI: 10.1093/jamia/ocz154] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 07/25/2019] [Accepted: 08/06/2019] [Indexed: 11/15/2022] Open
Abstract
OBJECTIVE Emergency departments (EDs) are increasingly overcrowded. Forecasting patient visit volume is challenging. Reliable and accurate forecasting strategies may help improve resource allocation and mitigate the effects of overcrowding. Patterns related to weather, day of the week, season, and holidays have been previously used to forecast ED visits. Internet search activity has proven useful for predicting disease trends and offers a new opportunity to improve ED visit forecasting. This study tests whether Google search data and relevant statistical methods can improve the accuracy of ED volume forecasting compared with traditional data sources. MATERIALS AND METHODS Seven years of historical daily ED arrivals were collected from Boston Children's Hospital. We used data from the public school calendar, National Oceanic and Atmospheric Administration, and Google Trends. Multiple linear models using LASSO (least absolute shrinkage and selection operator) for variable selection were created. The models were trained on 5 years of data and out-of-sample accuracy was judged using multiple error metrics on the final 2 years. RESULTS All data sources added complementary predictive power. Our baseline day-of-the-week model recorded average percent errors of 10.99%. Autoregressive terms, calendar and weather data reduced errors to 7.71%. Search volume data reduced errors to 7.58% theoretically preventing 4 improperly staffed days. DISCUSSION The predictive power provided by the search volume data may stem from the ability to capture population-level interaction with events, such as winter storms and infectious diseases, that traditional data sources alone miss. CONCLUSIONS This study demonstrates that search volume data can meaningfully improve forecasting of ED visit volume and could help improve quality and reduce cost.
Collapse
Affiliation(s)
- Sam Tideman
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Jonathan Bickel
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Ben Reis
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
- Predictive Medicine Group, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
30
|
Rangarajan P, Mody SK, Marathe M. Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data. PLoS Comput Biol 2019; 15:e1007518. [PMID: 31751346 PMCID: PMC6894887 DOI: 10.1371/journal.pcbi.1007518] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 12/05/2019] [Accepted: 10/29/2019] [Indexed: 12/20/2022] Open
Abstract
Dengue and influenza-like illness (ILI) are two of the leading causes of viral infection in the world and it is estimated that more than half the world’s population is at risk for developing these infections. It is therefore important to develop accurate methods for forecasting dengue and ILI incidences. Since data from multiple sources (such as dengue and ILI case counts, electronic health records and frequency of multiple internet search terms from Google Trends) can improve forecasts, standard time series analysis methods are inadequate to estimate all the parameter values from the limited amount of data available if we use multiple sources. In this paper, we use a computationally efficient implementation of the known variable selection method that we call the Autoregressive Likelihood Ratio (ARLR) method. This method combines sparse representation of time series data, electronic health records data (for ILI) and Google Trends data to forecast dengue and ILI incidences. This sparse representation method uses an algorithm that maximizes an appropriate likelihood ratio at every step. Using numerical experiments, we demonstrate that our method recovers the underlying sparse model much more accurately than the lasso method. We apply our method to dengue case count data from five countries/states: Brazil, Mexico, Singapore, Taiwan, and Thailand and to ILI case count data from the United States. Numerical experiments show that our method outperforms existing time series forecasting methods in forecasting the dengue and ILI case counts. In particular, our method gives a 18 percent forecast error reduction over a leading method that also uses data from multiple sources. It also performs better than other methods in predicting the peak value of the case count and the peak time. Dengue and influenza-like illness (ILI) are leading causes of viral infection in the world and hence it is important to develop accurate methods for forecasting their incidence. We use Autoregressive Likelihood Ratio method, which is a computationally efficient implementation of the variable selection method, in order to obtain a sparse (non-lasso) representation of time series, Google Trends and electronic health records (for ILI) data. This method is used to forecast dengue incidence in five countries/states and ILI incidence in USA. We show that this method outperforms existing time series methods in forecasting these diseases. The method is general and can also be used to forecast other diseases.
Collapse
Affiliation(s)
- Prashant Rangarajan
- Departments of Computer Science and Mathematics, Birla Institute of Technology and Science, Pilani, India
| | - Sandeep K. Mody
- Department of Mathematics, Indian Institute of Science, Bangalore, India
| | - Madhav Marathe
- Department of Computer Science, Network, Simulation Science and Advanced Computing Division, Biocomplexity Institute, University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail:
| |
Collapse
|
31
|
Kim M, Yune S, Chang S, Jung Y, Sa SO, Han HW. The Fever Coach Mobile App for Participatory Influenza Surveillance in Children: Usability Study. JMIR Mhealth Uhealth 2019; 7:e14276. [PMID: 31625946 PMCID: PMC6823603 DOI: 10.2196/14276] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 07/14/2019] [Accepted: 08/09/2019] [Indexed: 11/13/2022] Open
Abstract
Background Effective surveillance of influenza requires a broad network of health care providers actively reporting cases of influenza-like illnesses and positive laboratory results. Not only is this traditional surveillance system costly to establish and maintain but there is also a time lag between a change in influenza activity and its detection. A new surveillance system that is both reliable and timely will help public health officials to effectively control an epidemic and mitigate the burden of the disease. Objective This study aimed to evaluate the use of parent-reported data of febrile illnesses in children submitted through the Fever Coach app in real-time surveillance of influenza activities. Methods Fever Coach is a mobile app designed to help parents and caregivers manage fever in young children, currently mainly serviced in South Korea. The app analyzes data entered by a caregiver and provides tailored information for care of the child based on the child’s age, sex, body weight, body temperature, and accompanying symptoms. Using the data submitted to the app during the 2016-2017 influenza season, we built a regression model that monitors influenza incidence for the 2017-2018 season and validated the model by comparing the predictions with the public influenza surveillance data from the Korea Centers for Disease Control and Prevention (KCDC). Results During the 2-year study period, 70,203 diagnosis data, including 7702 influenza reports, were submitted. There was a significant correlation between the influenza activity predicted by Fever Coach and that reported by KCDC (Spearman ρ=0.878; P<.001). Using this model, the influenza epidemic in the 2017-2018 season was detected 10 days before the epidemic alert announced by KCDC. Conclusions The Fever Coach app successfully collected data from 7.73% (207,699/2,686,580) of the target population by providing care instruction for febrile children. These data were used to develop a model that accurately estimated influenza activity measured by the central government agency using reports from sentinel facilities in the national surveillance network.
Collapse
Affiliation(s)
| | - Sehyo Yune
- Mobile Doctor, Co, Ltd, Seoul, Republic of Korea
| | - Seyun Chang
- Mobile Doctor, Co, Ltd, Seoul, Republic of Korea
| | - Yuseob Jung
- Institute of Basic Medical Sciences, Graduate School of Medicine, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea.,Department of Biomedical Informatics, Graduate School of Medicine, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - Soon Ok Sa
- Institute of Basic Medical Sciences, Graduate School of Medicine, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea.,Department of Biomedical Informatics, Graduate School of Medicine, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - Hyun Wook Han
- Institute of Basic Medical Sciences, Graduate School of Medicine, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea.,Department of Biomedical Informatics, Graduate School of Medicine, CHA University, Seongnam-si, Gyeonggi-do, Republic of Korea.,Healthcare Bigdata Center, Bundang CHA General Hospital, Seongnam-si, Republic of Korea
| |
Collapse
|
32
|
Baltrusaitis K, Vespignani A, Rosenfeld R, Gray J, Raymond D, Santillana M. Differences in Regional Patterns of Influenza Activity Across Surveillance Systems in the United States: Comparative Evaluation. JMIR Public Health Surveill 2019; 5:e13403. [PMID: 31579019 PMCID: PMC6777281 DOI: 10.2196/13403] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 07/02/2019] [Accepted: 07/19/2019] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND The Centers for Disease Control and Prevention (CDC) tracks influenza-like illness (ILI) using information on patient visits to health care providers through the Outpatient Influenza-like Illness Surveillance Network (ILINet). As participation in this system is voluntary, the composition, coverage, and consistency of health care reports vary from state to state, leading to different measures of ILI activity between regions. The degree to which these measures reflect actual differences in influenza activity or systematic differences in the methods used to collect and aggregate the data is unclear. OBJECTIVE The objective of our study was to qualitatively and quantitatively compare national and region-specific ILI activity in the United States across 4 surveillance data sources-CDC ILINet, Flu Near You (FNY), athenahealth, and HealthTweets.org-to determine whether these data sources, commonly used as input in influenza modeling efforts, show geographical patterns that are similar to those observed in CDC ILINet's data. We also compared the yearly percentage of FNY participants who sought health care for ILI symptoms across geographical areas. METHODS We compared the national and regional 2018-2019 ILI activity baselines, calculated using noninfluenza weeks from previous years, for each surveillance data source. We also compared measures of ILI activity across geographical areas during 3 influenza seasons, 2015-2016, 2016-2017, and 2017-2018. Geographical differences in weekly ILI activity within each data source were also assessed using relative mean differences and time series heatmaps. National and regional age-adjusted health care-seeking percentages were calculated for each influenza season by dividing the number of FNY participants who sought medical care for ILI symptoms by the total number of ILI reports within an influenza season. Pearson correlations were used to assess the association between the health care-seeking percentages and baselines for each surveillance data source. RESULTS We observed consistent differences in ILI activity across geographical areas for CDC ILINet and athenahealth data. ILI activity for FNY displayed little variation across geographical areas, whereas differences in ILI activity for HealthTweets.org were associated with the total number of tweets within a geographical area. The percentage of FNY participants who sought health care for ILI symptoms differed slightly across geographical areas, and these percentages were positively correlated with CDC ILINet and athenahealth baselines. CONCLUSIONS Our findings suggest that differences in ILI activity across geographical areas as reported by a given surveillance system may not accurately reflect true differences in the prevalence of ILI. Instead, these differences may reflect systematic collection and aggregation biases that are particular to each system and consistent across influenza seasons. These findings are potentially relevant in the real-time analysis of the influenza season and in the definition of unbiased forecast models.
Collapse
Affiliation(s)
- Kristin Baltrusaitis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States
| | | | - Roni Rosenfeld
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Josh Gray
- athenaResearch at athenahealth, Watertown, MA, United States
| | - Dorrie Raymond
- athenaResearch at athenahealth, Watertown, MA, United States
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
33
|
Su K, Xu L, Li G, Ruan X, Li X, Deng P, Li X, Li Q, Chen X, Xiong Y, Lu S, Qi L, Shen C, Tang W, Rong R, Hong B, Ning Y, Long D, Xu J, Shi X, Yang Z, Zhang Q, Zhuang Z, Zhang L, Xiao J, Li Y. Forecasting influenza activity using self-adaptive AI model and multi-source data in Chongqing, China. EBioMedicine 2019; 47:284-292. [PMID: 31477561 PMCID: PMC6796527 DOI: 10.1016/j.ebiom.2019.08.024] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 08/09/2019] [Accepted: 08/09/2019] [Indexed: 02/05/2023] Open
Abstract
Background Early detection of influenza activity followed by timely response is a critical component of preparedness for seasonal influenza epidemic and influenza pandemic. However, most relevant studies were conducted at the regional or national level with regular seasonal influenza trends. There are few feasible strategies to forecast influenza activity at the local level with irregular trends. Methods Multi-source electronic data, including historical percentage of influenza-like illness (ILI%), weather data, Baidu search index and Sina Weibo data of Chongqing, China, were collected and integrated into an innovative Self-adaptive AI Model (SAAIM), which was constructed by integrating Seasonal Autoregressive Integrated Moving Average model and XGBoost model using a self-adaptive weight adjustment mechanism. SAAIM was applied to ILI% forecast in Chongqing from 2017 to 2018, of which the performance was compared with three previously available models on forecasting. Findings ILI% showed an irregular seasonal trend from 2012 to 2018 in Chongqing. Compared with three reference models, SAAIM achieved the best performance on forecasting ILI% of Chongqing with the mean absolute percentage error (MAPE) of 11·9%, 7·5%, and 11·9% during the periods of the year 2014–2016, 2017, and 2018 respectively. Among the three categories of source data, historical influenza activity contributed the most to the forecast accuracy by decreasing the MAPE by 19·6%, 43·1%, and 11·1%, followed by weather information (MAPE reduced by 3·3%, 17·1%, and 2·2%), and Internet-related public sentiment data (MAPE reduced by 1·1%, 0·9%, and 1·3%). Interpretation Accurate influenza forecast in areas with irregular seasonal influenza trends can be made by SAAIM with multi-source electronic data.
Collapse
Affiliation(s)
- Kun Su
- Department of Epidemiology, College of Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing, People's Republic of China; Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Liang Xu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Guanqiao Li
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Xiaowen Ruan
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xian Li
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Pan Deng
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xinmi Li
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Qin Li
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Xianxian Chen
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Yu Xiong
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Shaofeng Lu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Li Qi
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Chaobo Shen
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Wenge Tang
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Rong Rong
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Boran Hong
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Yi Ning
- Meinian Institute of Health, Beijing, People's Republic of China
| | - Dongyan Long
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Jiaying Xu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xuanling Shi
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Zhihong Yang
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Qi Zhang
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Ziqi Zhuang
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Linqi Zhang
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China.
| | - Jing Xiao
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China.
| | - Yafei Li
- Department of Epidemiology, College of Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing, People's Republic of China.
| |
Collapse
|
34
|
Soliman M, Lyubchich V, Gel YR. Complementing the power of deep learning with statistical model fusion: Probabilistic forecasting of influenza in Dallas County, Texas, USA. Epidemics 2019; 28:100345. [PMID: 31182294 DOI: 10.1016/j.epidem.2019.05.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Revised: 03/08/2019] [Accepted: 05/06/2019] [Indexed: 02/06/2023] Open
Abstract
Influenza is one of the main causes of death, not only in the USA but worldwide. Its significant economic and public health impacts necessitate development of accurate and efficient algorithms for forecasting of any upcoming influenza outbreaks. Most currently available methods for influenza prediction are based on parametric time series and regression models that impose restrictive and often unverifiable assumptions on the data. In turn, more flexible machine learning models and, particularly, deep learning tools whose utility is proven in a wide range of disciplines, remain largely under-explored in epidemiological forecasting. We study the seasonal influenza in Dallas County by evaluating the forecasting ability of deep learning with feedforward neural networks as well as performance of more conventional statistical models, such as beta regression, autoregressive integrated moving average (ARIMA), least absolute shrinkage and selection operators (LASSO), and non-parametric multivariate adaptive regression splines (MARS) models for one week and two weeks ahead forecasting. Furthermore, we assess forecasting utility of Google search queries and meteorological data as exogenous predictors of influenza activity. Finally, we develop a probabilistic forecasting of influenza in Dallas County by fusing all the considered models using Bayesian model averaging.
Collapse
Affiliation(s)
- Marwah Soliman
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| | - Vyacheslav Lyubchich
- Chesapeake Biological Laboratory, University of Maryland Center for Environmental Science, Solomons, MD, USA.
| | - Yulia R Gel
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|
35
|
He H, Henderson J, Ho JC. Distributed Tensor Decomposition for Large Scale Health Analytics. PROCEEDINGS OF THE ... INTERNATIONAL WORLD-WIDE WEB CONFERENCE. INTERNATIONAL WWW CONFERENCE 2019; 2019:659-669. [PMID: 31198910 PMCID: PMC6563812 DOI: 10.1145/3308558.3313548] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In the past few decades, there has been rapid growth in quantity and variety of healthcare data. These large sets of data are usually high dimensional (e.g. patients, their diagnoses, and medications to treat their diagnoses) and cannot be adequately represented as matrices. Thus, many existing algorithms can not analyze them. To accommodate these high dimensional data, tensor factorization, which can be viewed as a higher-order extension of methods like PCA, has attracted much attention and emerged as a promising solution. However, tensor factorization is a computationally expensive task, and existing methods developed to factor large tensors are not flexible enough for real-world situations. To address this scaling problem more efficiently, we introduce SGranite, a distributed, scalable, and sparse tensor factorization method fit through stochastic gradient descent. SGranite offers three contributions: (1) Scalability: it employs a block partitioning and parallel processing design and thus scales to large tensors, (2) Accuracy: we show that our method can achieve results faster without sacrificing the quality of the tensor decomposition, and (3) FlexibleConstraints: we show our approach can encompass various kinds of constraints including l2 norm, l1 norm, and logistic regularization. We demonstrate SGranite's capabilities in two real-world use cases. In the first, we use Google searches for flu-like symptoms to characterize and predict influenza patterns. In the second, we use SGranite to extract clinically interesting sets (i.e., phenotypes) of patients from electronic health records. Through these case studies, we show SGranite has the potential to be used to rapidly characterize, predict, and manage a large multimodal datasets, thereby promising a novel, data-driven solution that can benefit very large segments of the population.
Collapse
Affiliation(s)
- Huan He
- Emory University, Atlanta, Georgia
| | | | | |
Collapse
|
36
|
Clemente L, Lu F, Santillana M. Improved Real-Time Influenza Surveillance: Using Internet Search Data in Eight Latin American Countries. JMIR Public Health Surveill 2019; 5:e12214. [PMID: 30946017 PMCID: PMC6470460 DOI: 10.2196/12214] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 02/11/2019] [Accepted: 02/15/2019] [Indexed: 01/18/2023] Open
Abstract
Background Novel influenza surveillance systems that leverage Internet-based real-time data sources including Internet search frequencies, social-network information, and crowd-sourced flu surveillance tools have shown improved accuracy over the past few years in data-rich countries like the United States. These systems not only track flu activity accurately, but they also report flu estimates a week or more ahead of the publication of reports produced by healthcare-based systems, such as those implemented and managed by the Centers for Disease Control and Prevention. Previous work has shown that the predictive capabilities of novel flu surveillance systems, like Google Flu Trends (GFT), in developing countries in Latin America have not yet delivered acceptable flu estimates. Objective The aim of this study was to show that recent methodological improvements on the use of Internet search engine information to track diseases can lead to improved retrospective flu estimates in multiple countries in Latin America. Methods A machine learning-based methodology that uses flu-related Internet search activity and historical information to monitor flu activity, named ARGO (AutoRegression with Google search), was extended to generate flu predictions for 8 Latin American countries (Argentina, Bolivia, Brazil, Chile, Mexico, Paraguay, Peru, and Uruguay) for the time period: January 2012 to December of 2016. These retrospective (out-of-sample) Influenza activity predictions were compared with historically observed flu suspected cases in each country, as reported by Flunet, an influenza surveillance database maintained by the World Health Organization. For a baseline comparison, retrospective (out-of-sample) flu estimates were produced for the same time period using autoregressive models that only leverage historical flu activity information. Results Our results show that ARGO-like models’ predictive power outperform autoregressive models in 6 out of 8 countries in the 2012-2016 time period. Moreover, ARGO significantly improves on historical flu estimates produced by the now discontinued GFT for the time period of 2012-2015, where GFT information is publicly available. Conclusions We demonstrate here that a self-correcting machine learning method, leveraging Internet-based disease-related search activity and historical flu trends, has the potential to produce reliable and timely flu estimates in multiple Latin American countries. This methodology may prove helpful to local public health officials who design and implement interventions aimed at mitigating the effects of influenza outbreaks. Our methodology generally outperforms both the now-discontinued tool GFT, and autoregressive methodologies that exploit only historical flu activity to produce future disease estimates.
Collapse
Affiliation(s)
- Leonardo Clemente
- School of Engineering and Sciences, Tecnologico de Monterrey, Monterrey, Mexico.,Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
| | - Fred Lu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
37
|
Ning S, Yang S, Kou SC. Accurate regional influenza epidemics tracking using Internet search data. Sci Rep 2019; 9:5238. [PMID: 30918276 PMCID: PMC6437143 DOI: 10.1038/s41598-019-41559-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 03/12/2019] [Indexed: 12/12/2022] Open
Abstract
Accurate, high-resolution tracking of influenza epidemics at the regional level helps public health agencies make informed and proactive decisions, especially in the face of outbreaks. Internet users' online searches offer great potential for the regional tracking of influenza. However, due to the complex data structure and reduced quality of Internet data at the regional level, few established methods provide satisfactory performance. In this article, we propose a novel method named ARGO2 (2-step Augmented Regression with GOogle data) that efficiently combines publicly available Google search data at different resolutions (national and regional) with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) for accurate, real-time regional tracking of influenza. ARGO2 gives very competitive performance across all US regions compared with available Internet-data-based regional influenza tracking methods, and it has achieved 30% error reduction over the best alternative method that we numerically tested for the period of March 2009 to March 2018. ARGO2 is reliable and robust, with the flexibility to incorporate additional information from other sources and resolutions, making it a powerful tool for regional influenza tracking, and potentially for tracking other social, economic, or public health events at the regional or local level.
Collapse
Affiliation(s)
- Shaoyang Ning
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA
| | - Shihao Yang
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA
| | - S C Kou
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA.
| |
Collapse
|
38
|
Avgeris M, Spatharakis D, Dechouniotis D, Kalatzis N, Roussaki I, Papavassiliou S. Where There Is Fire There Is SMOKE: A Scalable Edge Computing Framework for Early Fire Detection. SENSORS 2019; 19:s19030639. [PMID: 30717464 PMCID: PMC6387399 DOI: 10.3390/s19030639] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 01/23/2019] [Accepted: 01/26/2019] [Indexed: 11/16/2022]
Abstract
A Cyber-Physical Social System (CPSS) tightly integrates computer systems with the physical world and human activities. In this article, a three-level CPSS for early fire detection is presented to assist public authorities to promptly identify and act on emergency situations. At the bottom level, the system's architecture involves IoT nodes enabled with sensing and forest monitoring capabilities. Additionally, in this level, the crowd sensing paradigm is exploited to aggregate environmental information collected by end user devices present in the area of interest. Since the IoT nodes suffer from limited computational energy resources, an Edge Computing Infrastructure, at the middle level, facilitates the offloaded data processing regarding possible fire incidents. At the top level, a decision-making service deployed on Cloud nodes integrates data from various sources, including users' information on social media, and evaluates the situation criticality. In our work, a dynamic resource scaling mechanism for the Edge Computing Infrastructure is designed to address the demanding Quality of Service (QoS) requirements of this IoT-enabled time and mission critical application. The experimental results indicate that the vertical and horizontal scaling on the Edge Computing layer is beneficial for both the performance and the energy consumption of the IoT nodes.
Collapse
Affiliation(s)
- Marios Avgeris
- School of Electrical and Computer Engineering, National Technical University of Athens-NTUA, GR 157 80 Zografou, Greece.
| | - Dimitrios Spatharakis
- School of Electrical and Computer Engineering, National Technical University of Athens-NTUA, GR 157 80 Zografou, Greece.
| | - Dimitrios Dechouniotis
- School of Electrical and Computer Engineering, National Technical University of Athens-NTUA, GR 157 80 Zografou, Greece.
| | - Nikos Kalatzis
- School of Electrical and Computer Engineering, National Technical University of Athens-NTUA, GR 157 80 Zografou, Greece.
| | - Ioanna Roussaki
- School of Electrical and Computer Engineering, National Technical University of Athens-NTUA, GR 157 80 Zografou, Greece.
| | - Symeon Papavassiliou
- School of Electrical and Computer Engineering, National Technical University of Athens-NTUA, GR 157 80 Zografou, Greece.
| |
Collapse
|
39
|
Kalatzis N, Routis G, Marinellis Y, Avgeris M, Roussaki I, Papavassiliou S, Anagnostou M. Semantic Interoperability for IoT Platforms in Support of Decision Making: An Experiment on Early Wildfire Detection. SENSORS 2019; 19:s19030528. [PMID: 30691223 PMCID: PMC6387244 DOI: 10.3390/s19030528] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 01/16/2019] [Accepted: 01/19/2019] [Indexed: 11/28/2022]
Abstract
One of the main obstacles towards the promotion of IoT adoption and innovation is data interoperability. Facilitating cross-domain interoperability is expected to be the core element for the realisation of the next generation of the IoT computing paradigm that is already taking shape under the name of Internet of Everything (IoE). In this article, an analysis of the current status on IoT semantic interoperability is presented that leads to the identification of a set of generic requirements that act as fundamental design principles for the specification of interoperability enabling solutions. In addition, an extension of NGSIv2 data model and API (de-facto) standards is proposed aiming to bridge the gap among IoT and social media and hence to integrate user communities with cyber-physical systems. These specifications have been utilised for the implementation of the IoT2Edge interoperability enabling mechanism which is evaluated within the context of a catastrophic wildfire incident that took place in Greece on July 2018. Weather data, social media activity, video recordings from the fire, sensor measurements and satellite data, linked to the location and the time of this fire incident have been collected, modeled in a uniform manner and fed to an early fire detection decision support system. The findings of the experiment certify that achieving minimum data interoperability with light-weight, plug-n-play mechanisms can be realised with significant benefits for our society.
Collapse
Affiliation(s)
- Nikos Kalatzis
- Institute of Communication and Computer Systems, 15773 Athens, Greece.
| | - George Routis
- Institute of Communication and Computer Systems, 15773 Athens, Greece.
| | | | - Marios Avgeris
- Institute of Communication and Computer Systems, 15773 Athens, Greece.
| | - Ioanna Roussaki
- Institute of Communication and Computer Systems, 15773 Athens, Greece.
| | | | | |
Collapse
|
40
|
Lu FS, Hattab MW, Clemente CL, Biggerstaff M, Santillana M. Improved state-level influenza nowcasting in the United States leveraging Internet-based data and network approaches. Nat Commun 2019; 10:147. [PMID: 30635558 PMCID: PMC6329822 DOI: 10.1038/s41467-018-08082-0] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 12/12/2018] [Indexed: 12/01/2022] Open
Abstract
In the presence of health threats, precision public health approaches aim to provide targeted, timely, and population-specific interventions. Accurate surveillance methodologies that can estimate infectious disease activity ahead of official healthcare-based reports, at relevant spatial resolutions, are important for achieving this goal. Here we introduce a methodological framework which dynamically combines two distinct influenza tracking techniques, using an ensemble machine learning approach, to achieve improved state-level influenza activity estimates in the United States. The two predictive techniques behind the ensemble utilize (1) a self-correcting statistical method combining influenza-related Google search frequencies, information from electronic health records, and historical flu trends within each state, and (2) a network-based approach leveraging spatio-temporal synchronicities observed in historical influenza activity across states. The ensemble considerably outperforms each component method in addition to previously proposed state-specific methods for influenza tracking, with higher correlations and lower prediction errors.
Collapse
Affiliation(s)
- Fred S Lu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA.
| | - Mohammad W Hattab
- Wyss Institute for Biologically Inspired Engineering, Harvard Medical School, Boston, MA, 02115, USA
| | | | - Matthew Biggerstaff
- Influenza Division, National Center for Immunization and Respiratory Disease, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA.
- Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
41
|
Lipsitch M, Santillana M. Enhancing Situational Awareness to Prevent Infectious Disease Outbreaks from Becoming Catastrophic. Curr Top Microbiol Immunol 2019; 424:59-74. [DOI: 10.1007/82_2019_172] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
42
|
Mavian C, Dulcey M, Munoz O, Salemi M, Vittor AY, Capua I. Islands as Hotspots for Emerging Mosquito-Borne Viruses: A One-Health Perspective. Viruses 2018; 11:E11. [PMID: 30585228 PMCID: PMC6356932 DOI: 10.3390/v11010011] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 12/18/2018] [Accepted: 12/18/2018] [Indexed: 02/08/2023] Open
Abstract
During the past ten years, an increasing number of arbovirus outbreaks have affected tropical islands worldwide. We examined the available literature in peer-reviewed journals, from the second half of the 20th century until 2018, with the aim of gathering an overall picture of the emergence of arboviruses in these islands. In addition, we included information on environmental and social drivers specific to island setting that can facilitate the emergence of outbreaks. Within the context of the One Health approach, our review highlights how the emergence of arboviruses in tropical islands is linked to the complex interplay between their unique ecological settings and to the recent changes in local and global sociodemographic patterns. We also advocate for greater coordination between stakeholders in developing novel prevention and mitigation approaches for an intractable problem.
Collapse
Affiliation(s)
- Carla Mavian
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32611, USA.
- Emerging Pathogens Institute University of Florida, Gainesville, FL 32611, USA.
| | - Melissa Dulcey
- Emerging Pathogens Institute University of Florida, Gainesville, FL 32611, USA.
- Department of Environmental and Global Health, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32611, USA.
| | - Olga Munoz
- Emerging Pathogens Institute University of Florida, Gainesville, FL 32611, USA.
- Department of Environmental and Global Health, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32611, USA.
- One Health Center of Excellence, University of Florida, Gainesville, FL 32611, USA.
| | - Marco Salemi
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32611, USA.
- Emerging Pathogens Institute University of Florida, Gainesville, FL 32611, USA.
| | - Amy Y Vittor
- Emerging Pathogens Institute University of Florida, Gainesville, FL 32611, USA.
- Division of Infectious Diseases and Global Medicine, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL 32611, USA.
| | - Ilaria Capua
- Emerging Pathogens Institute University of Florida, Gainesville, FL 32611, USA.
- One Health Center of Excellence, University of Florida, Gainesville, FL 32611, USA.
| |
Collapse
|
43
|
Tan Q, Ma AJ, Deng H, Wong VWS, Tse YK, Yip TCF, Wong GLH, Ching JYL, Chan FKL, Yuen PC. A Hybrid Residual Network and Long Short-Term Memory Method for Peptic Ulcer Bleeding Mortality Prediction. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:998-1007. [PMID: 30815143 PMCID: PMC6371275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The prediction of patient mortality, which can detect high-risk patients, is a significant yet challenging problem in medical informatics. Thanks to the wide adoption of electronic health records (EHRs), many data-driven methods have been proposed to forecast mortality. However, most existing methods do not consider correlations between static and dynamic data, which contain significant information about mutual influences between these data. In this paper, we utilize a deep Residual Network (ResNet) consisting of many convolution units, which can jointly analyze different variables, to capture correlation information in and between static and dynamic variables. Furthermore, the Long Short-Term Memory (LSTM) method is used to extract temporal dependencies information from dynamic data. Finally, a deep fusion method is used to integrate these different types of information to improve mortality prediction. Experiment results on Peptic Ulcer Bleeding (PUB) mortality prediction show that the proposed method outperforms existing methods and achieves an AUC (area under the receiver operating characteristic curve) score of 0.9353.
Collapse
Affiliation(s)
| | - Andy Jinhua Ma
- Hong Kong Baptist University, Hong Kong
- Sun Yat-Sen University, Guangzhou, China
| | - Huiqi Deng
- Hong Kong Baptist University, Hong Kong
- Sun Yat-Sen University, Guangzhou, China
| | | | - Yee-Kit Tse
- The Chinese University of Hong Kong, Hong Kong
| | | | | | | | | | | |
Collapse
|
44
|
Machine-learned epidemiology: real-time detection of foodborne illness at scale. NPJ Digit Med 2018; 1:36. [PMID: 31304318 PMCID: PMC6550174 DOI: 10.1038/s41746-018-0045-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Revised: 07/20/2018] [Accepted: 07/26/2018] [Indexed: 11/09/2022] Open
Abstract
Machine learning has become an increasingly powerful tool for solving complex problems, and its application in public health has been underutilized. The objective of this study is to test the efficacy of a machine-learned model of foodborne illness detection in a real-world setting. To this end, we built FINDER, a machine-learned model for real-time detection of foodborne illness using anonymous and aggregated web search and location data. We computed the fraction of people who visited a particular restaurant and later searched for terms indicative of food poisoning to identify potentially unsafe restaurants. We used this information to focus restaurant inspections in two cities and demonstrated that FINDER improves the accuracy of health inspections; restaurants identified by FINDER are 3.1 times as likely to be deemed unsafe during the inspection as restaurants identified by existing methods. Additionally, FINDER enables us to ascertain previously intractable epidemiological information, for example, in 38% of cases the restaurant potentially causing food poisoning was not the last one visited, which may explain the lower precision of complaint-based inspections. We found that FINDER is able to reliably identify restaurants that have an active lapse in food safety, allowing for implementation of corrective actions that would prevent the potential spread of foodborne illness.
Collapse
|
45
|
Singh S. Alignment-Free Analyses of Nucleic Acid Sequences Using Graphical Representation (with Special Reference to Pandemic Bird Flu and Swine Flu). Synth Biol (Oxf) 2018. [PMCID: PMC7121243 DOI: 10.1007/978-981-10-8693-9_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
The exponential growth in database of bio-molecular sequences have spawned many approaches towards storage, retrieval, classification and analyses requirements. Alignment-free techniques such as graphical representations and numerical characterisation (GRANCH) methods have enabled some detailed analyses of large sequences and found a number of different applications in the eukaryotic and prokaryotic domain. In particular, recalling the history of pandemic influenza in brief, we have followed the progress of viral infections such as bird flu of 1997 onwards and determined that the virus can spread conserved over space and time, that influenza virus can undergo fairly conspicuous recombination-like events in segmented genes, that certain segments of the neuraminidase and hemagglutinin surface proteins remain conserved and can be targeted for peptide vaccines. We recount in some detail a few of the representative GRANCH techniques to provide a glimpse of how these methods are used in formulating quantitative sequence descriptors to analyse DNA, RNA and protein sequences to derive meaningful results. Finally, we survey the surveillance techniques with a special reference to how the GRANCH techniques can be used for the purpose and recount the forecasts made of possible metamorphosis of pandemic bird flu to pandemic human infecting agents.
Collapse
Affiliation(s)
- Shailza Singh
- Department of Pathogenesis and Cellular Response, National Centre for Cell Science, Computational and Systems Biology Lab, Pune, Maharashtra India
| |
Collapse
|
46
|
Magumba MA, Nabende P, Mwebaze E. Design Choices for Automated Disease Surveillance in the Social Web. Online J Public Health Inform 2018; 10:e214. [PMID: 30349632 PMCID: PMC6194101 DOI: 10.5210/ojphi.v10i2.9312] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The social web has emerged as a dominant information architecture accelerating technology innovation on an unprecedented scale. The utility of these developments to public health use cases like disease surveillance, information dissemination, outbreak prediction and so forth has been widely investigated and variously demonstrated in work spanning several published experimental studies and deployed systems. In this paper we provide an overview of automated disease surveillance efforts based on the social web characterized by their different high level design choices regarding functional aspects like user participation and language parsing approaches. We briefly discuss the technical rationale and practical implications of these different choices in addition to the key limitations associated with these systems within the context of operable disease surveillance. We hope this can offer some technical guidance to multi-disciplinary teams on how best to implement, interpret and evaluate disease surveillance programs based on the social web.
Collapse
Affiliation(s)
- Mark Abraham Magumba
- Department of Information Systems, Makerere
University Uganda, College of Computing and Information Sciences
| | - Peter Nabende
- Department of Information Systems, Makerere
University Uganda, College of Computing and Information Sciences
| | - Ernest Mwebaze
- Department of Computer Science, Makerere University
Uganda, College of Computing and Information Sciences
| |
Collapse
|
47
|
Zhao Y, Xu Q, Chen Y, Tsui KL. Using Baidu index to nowcast hand-foot-mouth disease in China: a meta learning approach. BMC Infect Dis 2018; 18:398. [PMID: 30103690 PMCID: PMC6090735 DOI: 10.1186/s12879-018-3285-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Accepted: 07/31/2018] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Hand, foot, and mouth disease (HFMD) has been recognized as one of the leading infectious diseases among children in China, which causes hundreds of annual deaths since 2008. In China, the reports of monthly HFMD cases usually have a delay of 1-2 months due to the time needed for collecting and processing clinical information. This time lag is far from optimal for policymakers making decisions. To alleviate this information gap, this study uses a meta learning framework and combines publicly Internet-based information (Baidu search queries) for real-time estimation of HFMD cases. METHODS We incorporate Baidu index into modeling to nowcast the monthly HFMD incidences in Guangxi, Zhejiang, Henan provinces and the whole China. We develop a meta learning framework to select appropriate predictive model based on the statistical and time series meta features. Our proposed approach is assessed for the HFMD cases within the time period from July 2015 to June 2016 using multiple evaluation metrics including root mean squared error (RMSE) and correlation coefficient (Corr). RESULTS For the four areas: whole China, Guangxi, Zhejiang, and Henan, our approach is superior to the best competing models, reducing the RMSE by 37, 20, 20, and 30% respectively. Compared with all the alternative predictive methods, our estimates show the strongest correlation with the observations. CONCLUSIONS In this study, the proposed meta learning method significantly improves the HFMD prediction accuracy, demonstrating that: (1) the Internet-based information offers the possibility for effective HFMD nowcasts; (2) the meta learning approach is capable of adapting to a wide variety of data, and enables selecting appropriate method for improving the nowcasting accuracy.
Collapse
Affiliation(s)
- Yang Zhao
- Centre for System Informatics Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region, People's Republic of China.
| | - Qinneng Xu
- Department of Systems Engineering and Engineering Management, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region, People's Republic of China
| | - Yupeng Chen
- Department of Systems Engineering and Engineering Management, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region, People's Republic of China
| | - Kwok Leung Tsui
- Centre for System Informatics Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region, People's Republic of China.,Department of Systems Engineering and Engineering Management, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region, People's Republic of China
| |
Collapse
|
48
|
Liang F, Guan P, Wu W, Huang D. Forecasting influenza epidemics by integrating internet search queries and traditional surveillance data with the support vector machine regression model in Liaoning, from 2011 to 2015. PeerJ 2018; 6:e5134. [PMID: 29967755 PMCID: PMC6022725 DOI: 10.7717/peerj.5134] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 06/08/2018] [Indexed: 12/15/2022] Open
Abstract
Background Influenza epidemics pose significant social and economic challenges in China. Internet search query data have been identified as a valuable source for the detection of emerging influenza epidemics. However, the selection of the search queries and the adoption of prediction methods are crucial challenges when it comes to improving predictions. The purpose of this study was to explore the application of the Support Vector Machine (SVM) regression model in merging search engine query data and traditional influenza data. Methods The official monthly reported number of influenza cases in Liaoning province in China was acquired from the China National Scientific Data Center for Public Health from January 2011 to December 2015. Based on Baidu Index, a publicly available search engine database, search queries potentially related to influenza over the corresponding period were identified. An SVM regression model was built to be used for predictions, and the choice of three parameters (C, γ, ε) in the SVM regression model was determined by leave-one-out cross-validation (LOOCV) during the model construction process. The model’s performance was evaluated by the evaluation metrics including Root Mean Square Error, Root Mean Square Percentage Error and Mean Absolute Percentage Error. Results In total, 17 search queries related to influenza were generated through the initial query selection approach and were adopted to construct the SVM regression model, including nine queries in the same month, three queries at a lag of one month, one query at a lag of two months and four queries at a lag of three months. The SVM model performed well when with the parameters (C = 2, γ = 0.005, ɛ = 0.0001), based on the ensemble data integrating the influenza surveillance data and Baidu search query data. Conclusions The results demonstrated the feasibility of using internet search engine query data as the complementary data source for influenza surveillance and the efficiency of SVM regression model in tracking the influenza epidemics in Liaoning.
Collapse
Affiliation(s)
- Feng Liang
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China
| | - Peng Guan
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China
| | - Wei Wu
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China
| | - Desheng Huang
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China.,Department of Mathematics, School of Fundamental Sciences, China Medical University, Shenyang, Liaoning, China
| |
Collapse
|
49
|
Brooks LC, Farrow DC, Hyun S, Tibshirani RJ, Rosenfeld R. Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions. PLoS Comput Biol 2018; 14:e1006134. [PMID: 29906286 PMCID: PMC6034894 DOI: 10.1371/journal.pcbi.1006134] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 07/06/2018] [Accepted: 04/10/2018] [Indexed: 11/18/2022] Open
Abstract
Accurate and reliable forecasts of seasonal epidemics of infectious disease can assist in the design of countermeasures and increase public awareness and preparedness. This article describes two main contributions we made recently toward this goal: a novel approach to probabilistic modeling of surveillance time series based on "delta densities", and an optimization scheme for combining output from multiple forecasting methods into an adaptively weighted ensemble. Delta densities describe the probability distribution of the change between one observation and the next, conditioned on available data; chaining together nonparametric estimates of these distributions yields a model for an entire trajectory. Corresponding distributional forecasts cover more observed events than alternatives that treat the whole season as a unit, and improve upon multiple evaluation metrics when extracting key targets of interest to public health officials. Adaptively weighted ensembles integrate the results of multiple forecasting methods, such as delta density, using weights that can change from situation to situation. We treat selection of optimal weightings across forecasting methods as a separate estimation task, and describe an estimation procedure based on optimizing cross-validation performance. We consider some details of the data generation process, including data revisions and holiday effects, both in the construction of these forecasting methods and when performing retrospective evaluation. The delta density method and an adaptively weighted ensemble of other forecasting methods each improve significantly on the next best ensemble component when applied separately, and achieve even better cross-validated performance when used in conjunction. We submitted real-time forecasts based on these contributions as part of CDC's 2015/2016 FluSight Collaborative Comparison. Among the fourteen submissions that season, this system was ranked by CDC as the most accurate.
Collapse
Affiliation(s)
- Logan C. Brooks
- School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| | - David C. Farrow
- School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Sangwon Hyun
- Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Ryan J. Tibshirani
- School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Roni Rosenfeld
- School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
50
|
Dolley S. Big Data's Role in Precision Public Health. Front Public Health 2018; 6:68. [PMID: 29594091 PMCID: PMC5859342 DOI: 10.3389/fpubh.2018.00068] [Citation(s) in RCA: 103] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 02/20/2018] [Indexed: 01/01/2023] Open
Abstract
Precision public health is an emerging practice to more granularly predict and understand public health risks and customize treatments for more specific and homogeneous subpopulations, often using new data, technologies, and methods. Big data is one element that has consistently helped to achieve these goals, through its ability to deliver to practitioners a volume and variety of structured or unstructured data not previously possible. Big data has enabled more widespread and specific research and trials of stratifying and segmenting populations at risk for a variety of health problems. Examples of success using big data are surveyed in surveillance and signal detection, predicting future risk, targeted interventions, and understanding disease. Using novel big data or big data approaches has risks that remain to be resolved. The continued growth in volume and variety of available data, decreased costs of data capture, and emerging computational methods mean big data success will likely be a required pillar of precision public health into the future. This review article aims to identify the precision public health use cases where big data has added value, identify classes of value that big data may bring, and outline the risks inherent in using big data in precision public health efforts.
Collapse
|