1
|
Yan Q, Shan S, Zhang B, Sun W, Sun M, Luo Y, Zhao F, Guo X. Monitoring the Relationship between Social Network Status and Influenza Based on Social Media Data. Disaster Med Public Health Prep 2023; 17:e490. [PMID: 37721020 DOI: 10.1017/dmp.2023.117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/19/2023]
Abstract
BACKGROUND This article aims to analyze the relationship between user characteristics on social networks and influenza. METHODS Three specific research questions are investigated: (1) we classify Weibo updates to recognize influenza-related information based on machine learning algorithms and propose a quantitative model for influenza susceptibility in social networks; (2) we adopt in-degree indicator from complex networks theory as social media status to verify its coefficient correlation with influenza susceptibility; (3) we also apply the LDA topic model to explore users' physical condition from Weibo to further calculate its coefficient correlation with influenza susceptibility. From the perspective of social networking status, we analyze and extract influenza-related information from social media, with many advantages including efficiency, low cost, and real time. RESULTS We find a moderate negative correlation between the susceptibility of users to influenza and social network status, while there is a significant positive correlation between physical condition and susceptibility to influenza. CONCLUSIONS Our findings reveal the laws behind the phenomenon of online disease transmission, and providing important evidence for analyzing, predicting, and preventing disease transmission. Also, this study provides theoretical and methodological underpinnings for further exploration and measurement of more factors associated with infection control and public health from social networks.
Collapse
Affiliation(s)
- Qi Yan
- Management School, Tianjin Normal University, Tianjin, China
| | - Siqing Shan
- School of Economics and Management, Beihang University, Beijing, China
- Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operation, Beijing, China
| | - Baishang Zhang
- Development Research Center of State Administration for Market Regulation of the PR China, Beijing, China
| | - Weize Sun
- School of Economics and Management, Beihang University, Beijing, China
- Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operation, Beijing, China
| | - Menghan Sun
- School of Economics and Management, Beihang University, Beijing, China
- Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operation, Beijing, China
| | - Yiting Luo
- School of Economics and Management, Beihang University, Beijing, China
- Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operation, Beijing, China
| | - Feng Zhao
- School of Economics and Management, Beihang University, Beijing, China
- Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operation, Beijing, China
| | - Xiaoshuang Guo
- School of Economics and Management, Beihang University, Beijing, China
- Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operation, Beijing, China
| |
Collapse
|
2
|
Morris M, Hayes P, Cox IJ, Lampos V. Neural network models for influenza forecasting with associated uncertainty using Web search activity trends. PLoS Comput Biol 2023; 19:e1011392. [PMID: 37639427 PMCID: PMC10491400 DOI: 10.1371/journal.pcbi.1011392] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 09/08/2023] [Accepted: 07/26/2023] [Indexed: 08/31/2023] Open
Abstract
Influenza affects millions of people every year. It causes a considerable amount of medical visits and hospitalisations as well as hundreds of thousands of deaths. Forecasting influenza prevalence with good accuracy can significantly help public health agencies to timely react to seasonal or novel strain epidemics. Although significant progress has been made, influenza forecasting remains a challenging modelling task. In this paper, we propose a methodological framework that improves over the state-of-the-art forecasting accuracy of influenza-like illness (ILI) rates in the United States. We achieve this by using Web search activity time series in conjunction with historical ILI rates as observations for training neural network (NN) architectures. The proposed models incorporate Bayesian layers to produce associated uncertainty intervals to their forecast estimates, positioning themselves as legitimate complementary solutions to more conventional approaches. The best performing NN, referred to as the iterative recurrent neural network (IRNN) architecture, reduces mean absolute error by 10.3% and improves skill by 17.1% on average in nowcasting and forecasting tasks across 4 consecutive flu seasons.
Collapse
Affiliation(s)
- Michael Morris
- University College London, Centre for Artificial Intelligence, Department of Computer Science, London, United Kingdom
| | - Peter Hayes
- University College London, Centre for Artificial Intelligence, Department of Computer Science, London, United Kingdom
| | - Ingemar J. Cox
- University College London, Centre for Artificial Intelligence, Department of Computer Science, London, United Kingdom
- University of Copenhagen, Department of Computer Science, Copenhagen, Denmark
| | - Vasileios Lampos
- University College London, Centre for Artificial Intelligence, Department of Computer Science, London, United Kingdom
| |
Collapse
|
3
|
Wang Y, Zhou H, Zheng L, Li M, Hu B. Using the Baidu index to predict trends in the incidence of tuberculosis in Jiangsu Province, China. Front Public Health 2023; 11:1203628. [PMID: 37533520 PMCID: PMC10390734 DOI: 10.3389/fpubh.2023.1203628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/05/2023] [Indexed: 08/04/2023] Open
Abstract
Objective To analyze the time series in the correlation between search terms related to tuberculosis (TB) and actual incidence data in China. To screen out the "leading" terms and construct a timely and efficient TB prediction model that can predict the next wave of TB epidemic trend in advance. Methods Monthly incidence data of tuberculosis in Jiangsu Province, China, were collected from January 2011 to December 2020. A scoping approach was used to identify TB search terms around common TB terms, prevention, symptoms and treatment. Search terms for Jiangsu Province, China, from January 2011 to December 2020 were collected from the Baidu index database. Correlation coefficients between search terms and actual incidence were calculated using Python 3.6 software. The multiple linear regression model was constructed using SPSS 26.0 software, which also calculated the goodness of fit and prediction error of the model predictions. Results A total of 16 keywords with correlation coefficients greater than 0.6 were screened, of which 11 were the leading terms. The R2 of the prediction model was 0.67 and the MAPE was 10.23%. Conclusion The TB prediction model based on Baidu Index data was able to predict the next wave of TB epidemic trends and intensity 2 months in advance. This forecasting model is currently only available for Jiangsu Province.
Collapse
|
4
|
Sun J, Yuan K, Chen C, Xu H, Wang H, Zhi Y, Peng S, Peng CK, Huang N, Huang G, Yang A. Causality Network of Infectious Disease Revealed With Causal Decomposition. IEEE J Biomed Health Inform 2023; 27:3657-3665. [PMID: 37071521 DOI: 10.1109/jbhi.2023.3268081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
Abstract
Causal inference in the field of infectious disease attempts to gain insight into the potential causal nature of an association between risk factors and diseases. Simulated causality inference experiments have shown preliminary promise in improving understanding of the transmission of infectious diseases but still lack sufficient quantitative causal inference studies based on real-world data. Here, we investigate the causal interactions between three different infectious diseases and related factors, using causal decomposition analysis, to characterize the nature of infectious disease transmission. We show that the complex interactions between infectious disease and human behavior have a quantifiable impact on transmission efficiency of infectious diseases. Our findings, by shedding light on the underlying transmission mechanism of infectious diseases, suggest that causal inference analysis is a promising approach to determine epidemiological interventions.
Collapse
|
5
|
Lin C, Zhou J, Zhang J, Yang C, Agichtein E. Graph Neural Network Modeling of Web Search Activity for Real-time Pandemic Forecasting. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2023; 2023:128-137. [PMID: 38332952 PMCID: PMC10853009 DOI: 10.1109/ichi57859.2023.00027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
The utilization of web search activity for pandemic forecasting has significant implications for managing disease spread and informing policy decisions. However, web search records tend to be noisy and influenced by geographical location, making it difficult to develop large-scale models. While regularized linear models have been effective in predicting the spread of respiratory illnesses like COVID-19, they are limited to specific locations. The lack of incorporation of neighboring areas' data and the inability to transfer models to new locations with limited data has impeded further progress. To address these limitations, this study proposes a novel self-supervised message-passing neural network (SMPNN) framework for modeling local and cross-location dynamics in pandemic forecasting. The SMPNN framework utilizes an MPNN module to learn cross-location dependencies through self-supervised learning and improve local predictions with graph-generated features. The framework is designed as an end-to-end solution and is compared with state-of-the-art statistical and deep learning models using COVID-19 data from England and the US. The results of the study demonstrate that the SMPNN model outperforms other models by achieving up to a 6.9% improvement in prediction accuracy and lower prediction errors during the early stages of disease outbreaks. This approach represents a significant advancement in disease surveillance and forecasting, providing a novel methodology, datasets, and insights that combine web search data and spatial information. The proposed SMPNN framework offers a promising avenue for modeling the spread of pandemics, leveraging both local and cross-location information, and has the potential to inform public health policy decisions.
Collapse
Affiliation(s)
- Chen Lin
- Department of Computer Science, Emory University, Atlanta, USA
| | - Jianghong Zhou
- Department of Computer Science, Emory University, Atlanta, USA
| | - Jing Zhang
- Department of Computer Science, Emory University, Atlanta, USA
| | - Carl Yang
- Department of Computer Science, Emory University, Atlanta, USA
| | | |
Collapse
|
6
|
Luca M, Campedelli GM, Centellegher S, Tizzoni M, Lepri B. Crime, inequality and public health: a survey of emerging trends in urban data science. Front Big Data 2023; 6:1124526. [PMID: 37303974 PMCID: PMC10248183 DOI: 10.3389/fdata.2023.1124526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 05/10/2023] [Indexed: 06/13/2023] Open
Abstract
Urban agglomerations are constantly and rapidly evolving ecosystems, with globalization and increasing urbanization posing new challenges in sustainable urban development well summarized in the United Nations' Sustainable Development Goals (SDGs). The advent of the digital age generated by modern alternative data sources provides new tools to tackle these challenges with spatio-temporal scales that were previously unavailable with census statistics. In this review, we present how new digital data sources are employed to provide data-driven insights to study and track (i) urban crime and public safety; (ii) socioeconomic inequalities and segregation; and (iii) public health, with a particular focus on the city scale.
Collapse
Affiliation(s)
- Massimiliano Luca
- Mobile and Social Computing Lab, Bruno Kessler Foundation, Trento, Italy
- Faculty of Computer Science, Free University of Bolzano, Bolzano, Italy
| | | | | | - Michele Tizzoni
- Department of Sociology and Social Research, University of Trento, Trento, Italy
| | - Bruno Lepri
- Mobile and Social Computing Lab, Bruno Kessler Foundation, Trento, Italy
| |
Collapse
|
7
|
Jang B, Kim I, Kim JW. Long-Term Influenza Outbreak Forecast Using Time-Precedence Correlation of Web Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2400-2412. [PMID: 34469319 DOI: 10.1109/tnnls.2021.3106637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Influenza leads to many deaths every year and is a threat to human health. For effective prevention, traditional national-scale statistical surveillance systems have been developed, and numerous studies have been conducted to predict influenza outbreaks using web data. Most studies have captured the short-term signs of influenza outbreaks, such as one-week prediction using the characteristics of web data uploaded in real time; however, long-term predictions of more than 2-10 weeks are required to effectively cope with influenza outbreaks. In this study, we determined that web data uploaded in real time have a time-precedence relationship with influenza outbreaks. For example, a few weeks before an influenza pandemic, the word "colds" appears frequently in web data. The web data after the appearance of the word "colds" can be used as information for forecasting future influenza outbreaks, which can improve long-term influenza prediction accuracy. In this study, we propose a novel long-term influenza outbreak forecast model utilizing the time precedence between the emergence of web data and an influenza outbreak. Based on the proposed model, we conducted experiments on: 1) selecting suitable web data for long-term influenza prediction; 2) determining whether the proposed model is regionally dependent; and 3) evaluating the accuracy according to the prediction timeframe. The proposed model showed a correlation of 0.87 in the long-term prediction of ten weeks while significantly outperforming other state-of-the-art methods.
Collapse
|
8
|
Kandula S, Olfson M, Gould MS, Keyes KM, Shaman J. Hindcasts and forecasts of suicide mortality in US: A modeling study. PLoS Comput Biol 2023; 19:e1010945. [PMID: 36913441 PMCID: PMC10047563 DOI: 10.1371/journal.pcbi.1010945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 03/28/2023] [Accepted: 02/13/2023] [Indexed: 03/14/2023] Open
Abstract
Deaths by suicide, as well as suicidal ideations, plans and attempts, have been increasing in the US for the past two decades. Deployment of effective interventions would require timely, geographically well-resolved estimates of suicide activity. In this study, we evaluated the feasibility of a two-step process for predicting suicide mortality: a) generation of hindcasts, mortality estimates for past months for which observational data would not have been available if forecasts were generated in real-time; and b) generation of forecasts with observational data augmented with hindcasts. Calls to crisis hotline services and online queries to the Google search engine for suicide-related terms were used as proxy data sources to generate hindcasts. The primary hindcast model (auto) is an Autoregressive Integrated Moving average model (ARIMA), trained on suicide mortality rates alone. Three regression models augment hindcast estimates from auto with call rates (calls), GHT search rates (ght) and both datasets together (calls_ght). The 4 forecast models used are ARIMA models trained with corresponding hindcast estimates. All models were evaluated against a baseline random walk with drift model. Rolling monthly 6-month ahead forecasts for all 50 states between 2012 and 2020 were generated. Quantile score (QS) was used to assess the quality of the forecast distributions. Median QS for auto was better than baseline (0.114 vs. 0.21. Median QS of augmented models were lower than auto, but not significantly different from each other (Wilcoxon signed-rank test, p > .05). Forecasts from augmented models were also better calibrated. Together, these results provide evidence that proxy data can address delays in release of suicide mortality data and improve forecast quality. An operational forecast system of state-level suicide risk may be feasible with sustained engagement between modelers and public health departments to appraise data sources and methods as well as to continuously evaluate forecast accuracy.
Collapse
Affiliation(s)
- Sasikiran Kandula
- Department of Environmental Health Sciences, Columbia University, New York, New York, United States of America
| | - Mark Olfson
- Department of Epidemiology, Columbia University, New York, New York, United States of America
- Department of Psychiatry, Columbia University, New York, New York, United States of America
| | - Madelyn S. Gould
- Department of Epidemiology, Columbia University, New York, New York, United States of America
- Department of Psychiatry, Columbia University, New York, New York, United States of America
| | - Katherine M. Keyes
- Department of Epidemiology, Columbia University, New York, New York, United States of America
| | - Jeffrey Shaman
- Department of Environmental Health Sciences, Columbia University, New York, New York, United States of America
| |
Collapse
|
9
|
Mavragani A, Yousefi S, Kahoro E, Karisani P, Liang D, Sarnat J, Agichtein E. Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Algorithm Development and Validation. JMIR Form Res 2022; 6:e23422. [PMID: 36534457 PMCID: PMC9808603 DOI: 10.2196/23422] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 10/06/2022] [Accepted: 10/25/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Real-time air pollution monitoring is a valuable tool for public health and environmental surveillance. In recent years, there has been a dramatic increase in air pollution forecasting and monitoring research using artificial neural networks. Most prior work relied on modeling pollutant concentrations collected from ground-based monitors and meteorological data for long-term forecasting of outdoor ozone (O3), oxides of nitrogen, and fine particulate matter (PM2.5). Given that traditional, highly sophisticated air quality monitors are expensive and not universally available, these models cannot adequately serve those not living near pollutant monitoring sites. Furthermore, because prior models were built based on physical measurement data collected from sensors, they may not be suitable for predicting the public health effects of pollution exposure. OBJECTIVE This study aimed to develop and validate models to nowcast the observed pollution levels using web search data, which are publicly available in near real time from major search engines. METHODS We developed novel machine learning-based models using both traditional supervised classification methods and state-of-the-art deep learning methods to detect elevated air pollution levels at the US city level by using generally available meteorological data and aggregate web-based search volume data derived from Google Trends. We validated the performance of these methods by predicting 3 critical air pollutants (O3, nitrogen dioxide, and PM2.5) across 10 major US metropolitan statistical areas in 2017 and 2018. We also explore different variations of the long short-term memory model and propose a novel search term dictionary learner-long short-term memory model to learn sequential patterns across multiple search terms for prediction. RESULTS The top-performing model was a deep neural sequence model long short-term memory, using meteorological and web search data, and reached an accuracy of 0.82 (F1-score 0.51) for O3, 0.74 (F1-score 0.41) for nitrogen dioxide, and 0.85 (F1-score 0.27) for PM2.5, when used for detecting elevated pollution levels. Compared with using only meteorological data, the proposed method achieved superior accuracy by incorporating web search data. CONCLUSIONS The results show that incorporating web search data with meteorological data improves the nowcasting performance for all 3 pollutants and suggest promising novel applications for tracking global physical phenomena using web search data.
Collapse
Affiliation(s)
| | - Safoora Yousefi
- Department of Computer Science, Emory University, Atlanta, GA, United States
| | - Elvis Kahoro
- Department of Computer Science, Pomona College, Claremont, CA, United States
| | - Payam Karisani
- Department of Computer Science, Emory University, Atlanta, GA, United States
| | - Donghai Liang
- Department of Environmental Health, Emory University, Atlanta, GA, United States
| | - Jeremy Sarnat
- Department of Environmental Health, Emory University, Atlanta, GA, United States
| | - Eugene Agichtein
- Department of Computer Science, Emory University, Atlanta, GA, United States
| |
Collapse
|
10
|
CALABRÒ GIOVANNAELISA, ICARDI GIANCARLO, BONANNI PAOLO, GABUTTI GIOVANNI, VITALE FRANCESCO, RIZZO CATERINA, CICCHETTI AMERICO, STAIANO ANNAMARIA, ANSALDI FILIPPO, ORSI ANDREA, DE WAURE CHIARA, PANATTO DONATELLA, AMICIZIA DANIELA, BERT FABRIZIO, VILLANI ALBERTO, IERACI ROBERTO, CONVERSANO MICHELE, RUSSO CARMELA, RUMI FILIPPO, SCOTTI SILVESTRO, MAIO TOMMASA, RUSSO ROCCO, VACCARO CONCETTAMARIA, SILIQUINI ROBERTA, RICCIARDI WALTER. [Flu vaccination and value-based health care: operational solutions to safeguard public health]. JOURNAL OF PREVENTIVE MEDICINE AND HYGIENE 2022; 63:E1-E85. [PMID: 36310765 PMCID: PMC9586154 DOI: 10.15167/2421-4248/jpmh2022.63.2s2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- GIOVANNA ELISA CALABRÒ
- Sezione di Igiene, Dipartimento Universitario di Scienze della Vita e Sanità Pubblica, Università Cattolica del Sacro Cuore, Roma
- VIHTALI - Value In Health Technology and Academy for Leadership & Innovation, Spin-Off dell'Università Cattolica del Sacro Cuore, Roma
| | - GIANCARLO ICARDI
- Dipartimento di Scienze della Salute, Università degli Studi di Genova
- U.O. Igiene, IRCCS Ospedale Policlinico San Martino, Genova
| | - PAOLO BONANNI
- Dipartimento di Scienze della Salute (DSS), Università di Firenze
| | - GIOVANNI GABUTTI
- Coordinatore Nazionale GdL Vaccini e Politiche Vaccinali della SItI
| | - FRANCESCO VITALE
- Dipartimento Promozione della Salute, Materno-Infantile, di Medicina Interna e Specialistica di Eccellenza “G. D’Alessandro”, Università degli Studi di Palermo
| | - CATERINA RIZZO
- Dipartimento di ricerca traslazionale e nuove tecnologie in medicina e chirurgia, Università degli Studi di Pisa
| | - AMERICO CICCHETTI
- Alta Scuola di Economia e Management dei Sistemi Sanitari (ALTEMS), Università Cattolica del Sacro Cuore, Roma
| | - ANNAMARIA STAIANO
- Dipartimento di Scienze Mediche Traslazionali, Università degli Studi “Federico II”, Napoli
- Presidente Società Italiana di Pediatria (SIP)
| | - FILIPPO ANSALDI
- Dipartimento di Scienze della Salute, Università degli Studi di Genova
- A.Li.Sa. Azienda Ligure Sanitaria Regione Liguria
| | - ANDREA ORSI
- Dipartimento di Scienze della Salute, Università degli Studi di Genova
- U.O. Igiene, IRCCS Ospedale Policlinico San Martino, Genova
| | - CHIARA DE WAURE
- Dipartimento di Medicina e Chirurgia, Università degli Studi di Perugia
| | - DONATELLA PANATTO
- Dipartimento di Scienze della Salute, Università degli Studi di Genova
| | - DANIELA AMICIZIA
- Dipartimento di Scienze della Salute, Università degli Studi di Genova
- A.Li.Sa. Azienda Ligure Sanitaria Regione Liguria
| | - FABRIZIO BERT
- Dipartimento di Scienze della Sanità Pubblica e Pediatriche, Università degli Studi di Torino
- SSDU Igiene Ospedaliera e Governo delle Infezioni Correlate all’Assistenza, ASL TO3
| | - ALBERTO VILLANI
- Dipartimento Emergenza Accettazione Ospedale Pediatrico Bambino Gesù, IRCCS, Roma
- Dipartimento di Medicina dei Sistemi, Università di Roma Tor Vergata
| | - ROBERTO IERACI
- Strategie vaccinali, Regione Lazio
- Ricercatore associato CID Ethics-CNR
| | | | - CARMELA RUSSO
- U.O.S.V.D. Epidemiologia - Comunicazione e Formazione Coordinamento delle Attività di Promozione della Salute e di Educazione Sanitaria, ASL Taranto
| | - FILIPPO RUMI
- Alta Scuola di Economia e Management dei Sistemi Sanitari (ALTEMS), Università Cattolica del Sacro Cuore, Roma
| | | | - TOMMASA MAIO
- Federazione Italiana Medici di Medicina Generale (FIMMG)
| | - ROCCO RUSSO
- Coordinatore tavolo tecnico vaccinazioni, Società Italiana di Pediatria (SIP)
| | | | - ROBERTA SILIQUINI
- Dipartimento di Scienze della Sanità Pubblica e Pediatriche, Università degli Studi di Torino
- AOU Città della Salute e della Scienza di Torino
| | - WALTER RICCIARDI
- Sezione di Igiene, Dipartimento Universitario di Scienze della Vita e Sanità Pubblica, Università Cattolica del Sacro Cuore, Roma
| |
Collapse
|
11
|
Robins K, Leonard AFC, Farkas K, Graham DW, Jones DL, Kasprzyk-Hordern B, Bunce JT, Grimsley JMS, Wade MJ, Zealand AM, McIntyre-Nolan S. Research needs for optimising wastewater-based epidemiology monitoring for public health protection. JOURNAL OF WATER AND HEALTH 2022; 20:1284-1313. [PMID: 36170187 DOI: 10.2166/wh.2022.026] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Wastewater-based epidemiology (WBE) is an unobtrusive method used to observe patterns in illicit drug use, poliovirus, and severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). The pandemic and need for surveillance measures have led to the rapid acceleration of WBE research and development globally. With the infrastructure available to monitor SARS-CoV-2 from wastewater in 58 countries globally, there is potential to expand targets and applications for public health protection, such as other viral pathogens, antimicrobial resistance (AMR), pharmaceutical consumption, or exposure to chemical pollutants. Some applications have been explored in academic research but are not used to inform public health decision-making. We reflect on the current knowledge of WBE for these applications and identify barriers and opportunities for expanding beyond SARS-CoV-2. This paper critically reviews the applications of WBE for public health and identifies the important research gaps for WBE to be a useful tool in public health. It considers possible uses for pathogenic viruses, AMR, and chemicals. It summarises the current evidence on the following: (1) the presence of markers in stool and urine; (2) environmental factors influencing persistence of markers in wastewater; (3) methods for sample collection and storage; (4) prospective methods for detection and quantification; (5) reducing uncertainties; and (6) further considerations for public health use.
Collapse
Affiliation(s)
- Katie Robins
- Environmental Monitoring for Health Protection, UK Health Security Agency, Nobel House, London SW1P 3HX, UK E-mail: ; School of Engineering, Newcastle University, Cassie Building, Newcastle-upon-Tyne NE1 7RU, UK
| | - Anne F C Leonard
- Environmental Monitoring for Health Protection, UK Health Security Agency, Nobel House, London SW1P 3HX, UK E-mail: ; University of Exeter Medical School, European Centre for Environment and Human Health, University of Exeter, Cornwall TR10 9FE, UK
| | - Kata Farkas
- School of Natural Sciences, Bangor University, Bangor, Gwynedd LL57 2UW, UK
| | - David W Graham
- School of Engineering, Newcastle University, Cassie Building, Newcastle-upon-Tyne NE1 7RU, UK
| | - David L Jones
- School of Natural Sciences, Bangor University, Bangor, Gwynedd LL57 2UW, UK; SoilsWest, Centre for Sustainable Farming Systems, Food Futures Institute, Murdoch University, Murdoch, WA 6105, Australia
| | | | - Joshua T Bunce
- Environmental Monitoring for Health Protection, UK Health Security Agency, Nobel House, London SW1P 3HX, UK E-mail: ; School of Engineering, Newcastle University, Cassie Building, Newcastle-upon-Tyne NE1 7RU, UK
| | - Jasmine M S Grimsley
- Environmental Monitoring for Health Protection, UK Health Security Agency, Nobel House, London SW1P 3HX, UK E-mail:
| | - Matthew J Wade
- Environmental Monitoring for Health Protection, UK Health Security Agency, Nobel House, London SW1P 3HX, UK E-mail: ; School of Engineering, Newcastle University, Cassie Building, Newcastle-upon-Tyne NE1 7RU, UK
| | - Andrew M Zealand
- Environmental Monitoring for Health Protection, UK Health Security Agency, Nobel House, London SW1P 3HX, UK E-mail:
| | - Shannon McIntyre-Nolan
- Environmental Monitoring for Health Protection, UK Health Security Agency, Nobel House, London SW1P 3HX, UK E-mail: ; Her Majesty's Prison and Probation Service, Ministry of Justice, London, SW1H 9AJ, UK
| |
Collapse
|
12
|
Han Q, Liu Z, Jia J, Anderson BT, Xu W, Shi P. Web-Based Data to Quantify Meteorological and Geographical Effects on Heat Stroke: Case Study in China. GEOHEALTH 2022; 6:e2022GH000587. [PMID: 35949256 PMCID: PMC9356531 DOI: 10.1029/2022gh000587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 06/16/2022] [Accepted: 06/28/2022] [Indexed: 06/15/2023]
Abstract
Heat stroke is a serious heat-related health outcome that can eventually lead to death. Due to the poor accessibility of heat stroke data, the large-scale relationship between heat stroke and meteorological factors is still unclear. This work aims to clarify the potential relationship between meteorological variables and heat stroke, and quantify the meteorological threshold that affected the severity of heat stroke. We collected daily heat stroke search index (HSSI) and meteorological data for the period 2013-2020 in 333 Chinese cities to analyze the relationship between meteorological variables and HSSI using correlation analysis and Random forest (RF) model. Temperature and relative humidity (RH) accounted for 62% and 9% of the changes of HSSI, respectively. In China, cases of heat stroke may start to occur when temperature exceeds 36°C and RH exceeds 58%. This threshold was 34.5°C and 79% in the north of China, and 36°C and 48% in the south of China. Compared to RH, the threshold of temperature showed a more evident difference affected by altitude and distance from the ocean, which was 35.5°C in inland cities and 36.5°C in coastal cities; 35.5°C in high-altitude cities and 36°C in low-altitude cities. Our findings provide a possible way to analyze the interaction effect of meteorological variables on heat-related illnesses, and emphasizes the effects of geographical environment. The meteorological threshold quantified in this research can also support policymaker to establish a better meteorological warning system for public health.
Collapse
Affiliation(s)
- Qinmei Han
- State Key Laboratory of Earth Surface Processes and Resource EcologyBeijing Normal UniversityBeijingChina
- Academy of Disaster Reduction and Emergency ManagementMinistry of Emergency Management and Ministry of EducationBeijing Normal UniversityBeijingChina
- Faculty of Geographical ScienceBeijing Normal UniversityBeijingChina
| | - Zhao Liu
- School of Linkong Economics and ManagementBeijing Institute of Economics and ManagementBeijingChina
| | - Junwen Jia
- School of System ScienceBeijing Normal UniversityBeijingChina
| | | | - Wei Xu
- State Key Laboratory of Earth Surface Processes and Resource EcologyBeijing Normal UniversityBeijingChina
- Academy of Disaster Reduction and Emergency ManagementMinistry of Emergency Management and Ministry of EducationBeijing Normal UniversityBeijingChina
- Faculty of Geographical ScienceBeijing Normal UniversityBeijingChina
| | - Peijun Shi
- State Key Laboratory of Earth Surface Processes and Resource EcologyBeijing Normal UniversityBeijingChina
- Academy of Disaster Reduction and Emergency ManagementMinistry of Emergency Management and Ministry of EducationBeijing Normal UniversityBeijingChina
- Faculty of Geographical ScienceBeijing Normal UniversityBeijingChina
- Academy of Plateau Science and SustainabilityPeople's Government of Qinghai Province and Beijing Normal UniversityXiningChina
| |
Collapse
|
13
|
Fan B, Peng J, Guo H, Gu H, Xu K, Wu T. Accurate Forecasting of Emergency Department Arrivals With Internet Search Index and Machine Learning Models: Model Development and Performance Evaluation. JMIR Med Inform 2022; 10:e34504. [PMID: 35857360 PMCID: PMC9350824 DOI: 10.2196/34504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 04/22/2022] [Accepted: 05/25/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Emergency department (ED) overcrowding is a concerning global health care issue, which is mainly caused by the uncertainty of patient arrivals, especially during the pandemic. Accurate forecasting of patient arrivals can allow health resource allocation in advance to reduce overcrowding. Currently, traditional data, such as historical patient visits, weather, holiday, and calendar, are primarily used to create forecasting models. However, data from an internet search engine (eg, Google) is less studied, although they can provide pivotal real-time surveillance information. The internet data can be employed to improve forecasting performance and provide early warning, especially during the epidemic. Moreover, possible nonlinearities between patient arrivals and these variables are often ignored. OBJECTIVE This study aims to develop an intelligent forecasting system with machine learning models and internet search index to provide an accurate prediction of ED patient arrivals, to verify the effectiveness of the internet search index, and to explore whether nonlinear models can improve the forecasting accuracy. METHODS Data on ED patient arrivals were collected from July 12, 2009, to June 27, 2010, the period of the 2009 H1N1 pandemic. These included 139,910 ED visits in our collaborative hospital, which is one of the biggest public hospitals in Hong Kong. Traditional data were also collected during the same period. The internet search index was generated from 268 search queries on Google to comprehensively capture the information about potential patients. The relationship between the index and patient arrivals was verified by Pearson correlation coefficient, Johansen cointegration, and Granger causality. Linear and nonlinear models were then developed with the internet search index to predict patient arrivals. The accuracy and robustness were also examined. RESULTS All models could accurately predict patient arrivals. The causality test indicated internet search index as a strong predictor of ED patient arrivals. With the internet search index, the mean absolute percentage error (MAPE) and the root mean square error (RMSE) of the linear model reduced from 5.3% to 5.0% and from 24.44 to 23.18, respectively, whereas the MAPE and RMSE of the nonlinear model decreased even more, from 3.5% to 3% and from 16.72 to 14.55, respectively. Compared with each other, the experimental results revealed that the forecasting system with extreme learning machine, as well as the internet search index, had the best performance in both forecasting accuracy and robustness analysis. CONCLUSIONS The proposed forecasting system can make accurate, real-time prediction of ED patient arrivals. Compared with the static traditional variables, the internet search index significantly improves forecasting as a reliable predictor monitoring continuous behavior trend and sudden changes during the epidemic (P=.002). The nonlinear model performs better than the linear counterparts by capturing the dynamic relationship between the index and patient arrivals. Thus, the system can facilitate staff planning and workflow monitoring.
Collapse
Affiliation(s)
- Bi Fan
- College of Management, Institute of Business Analysis and Supply Chain Management, Shenzhen University, Shenzhen, China
| | - Jiaxuan Peng
- Faculty of Science, University of St Andrews, St Andrews, United Kingdom
| | - Hainan Guo
- College of Management, Institute of Business Analysis and Supply Chain Management, Shenzhen University, Shenzhen, China
| | - Haobin Gu
- School of Management Science and Engineering, Dongbei University of Finance and Economics, Dalian, China
| | - Kangkang Xu
- School of Electromechanical Engineering, Guangdong University of Technology, Guangzhou, China
| | - Tingting Wu
- College of Management, Institute of Business Analysis and Supply Chain Management, Shenzhen University, Shenzhen, China
| |
Collapse
|
14
|
Gravino P, Prevedello G, Galletti M, Loreto V. The supply and demand of news during COVID-19 and assessment of questionable sources production. Nat Hum Behav 2022; 6:1069-1078. [PMID: 35606514 DOI: 10.1038/s41562-022-01353-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 04/14/2022] [Indexed: 11/09/2022]
Abstract
Misinformation threatens our societies, but little is known about how the production of news by unreliable sources relates to supply and demand dynamics. We exploit the burst of news production triggered by the COVID-19 outbreak through an Italian database partially annotated for questionable sources. We compare news supply with news demand, as captured by Google Trends data. We identify the Granger causal relationships between supply and demand for the most searched keywords, quantifying the inertial behaviour of the news supply. Focusing on COVID-19 news, we find that questionable sources are more sensitive than general news production to people's interests, especially when news supply and demand mismatched. We introduce an index assessing the level of questionable news production solely based on the available volumes of news and searches. We contend that these results can be a powerful asset in informing campaigns against disinformation and providing news outlets and institutions with potentially relevant strategies.
Collapse
Affiliation(s)
| | | | | | - Vittorio Loreto
- Sony Computer Science Laboratories, Paris, France.,Physics Department, Sapienza University of Rome, Rome, Italy.,Complexity Science Hub Vienna, Vienna, Austria
| |
Collapse
|
15
|
Khakimova A, Abdollahi L, Zolotarev O, Rahim F. Global interest in vaccines during the COVID-19 pandemic: Evidence from Google Trends. Vaccine X 2022; 10:100152. [PMID: 35291263 PMCID: PMC8915451 DOI: 10.1016/j.jvacx.2022.100152] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 10/23/2021] [Accepted: 02/21/2022] [Indexed: 12/16/2022] Open
Abstract
COVID-19 (coronavirus disease 2019) vaccines have become available; now, everyone has the opportunity to get vaccinated. We used Google Trends (GT) data to assess the global public interest in COVID-19 vaccines during the pandemic. For the analysis, a period of 17 months was chosen (from Jan 19, 2020, to Jul 04, 2021). Interest in user queries was tracked by keywords (corona vaccine, COVID-19 vaccine development, Sputnik v, Pfizer vaccine, AstraZeneca vaccine, etc.). The geographic analysis of queries was also carried out. The interest of users in the vaccine is significantly increasing. It is focused on the side effects of vaccines, and users pay attention to vaccines' developers from different countries. The correlation between the scientific publications devoted to vaccine development and such requests of users on the internet is absent. This study shows that internet search patterns can be used to gauge public attitudes towards coronavirus vaccination. Safety concerns consistently high follow an interest in vaccine side effects. This data can be used to track and predict attitudes towards vaccination of populations from COVID-19 in different countries before global vaccination becomes available to help mitigate the adverse effects of the pandemic.
Collapse
Affiliation(s)
- Aida Khakimova
- Department of Development of Scientific and Innovation Activities, Russian New University, Moscow, Russia
| | - Leila Abdollahi
- Department of Medical Library and Information Scince, School of Health Managment and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
| | - Oleg Zolotarev
- Department of Information Systems in Economics and Management, Russian New University, Moscow, Russia
| | - Fakher Rahim
- Metabolomics and Genomics Research Center, Tehran University of Medical Sciences, Tehran, Iran
- Health Research Institute, Thalassemia and Hemoglobinopathy Research Centre, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| |
Collapse
|
16
|
Yom-Tov E, Lampos V, Inns T, Cox IJ, Edelstein M. Providing early indication of regional anomalies in COVID-19 case counts in England using search engine queries. Sci Rep 2022; 12:2373. [PMID: 35149764 PMCID: PMC8837788 DOI: 10.1038/s41598-022-06340-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 01/28/2022] [Indexed: 11/09/2022] Open
Abstract
Prior work has shown the utility of using Internet searches to track the incidence of different respiratory illnesses. Similarly, people who suffer from COVID-19 may query for their symptoms prior to accessing the medical system (or in lieu of it). To assist in the UK government's response to the COVID-19 pandemic we analyzed searches for relevant symptoms on the Bing web search engine from users in England to identify areas of the country where unexpected rises in relevant symptom searches occurred. These were reported weekly to the UK Health Security Agency to assist in their monitoring of the pandemic. Our analysis shows that searches for "fever" and "cough" were the most correlated with future case counts during the initial stages of the pandemic, with searches preceding case counts by up to 21 days. Unexpected rises in search patterns were predictive of anomalous rises in future case counts within a week, reaching an Area Under Curve of 0.82 during the initial phase of the pandemic, and later reducing due to changes in symptom presentation. Thus, analysis of regional searches for symptoms can provide an early indicator (of more than one week) of increases in COVID-19 case counts.
Collapse
Affiliation(s)
- Elad Yom-Tov
- Microsoft Research, Herzliya, Israel.
- Faculty of Industrial Engineering and Management, Technion, Haifa, Israel.
| | - Vasileios Lampos
- Department of Computer Science, University College London, London, UK
| | - Thomas Inns
- UK Health Security Agency, London, UK
- St Helens and Knowsley Teaching Hospitals NHS Trust, Merseyside, UK
| | - Ingemar J Cox
- Department of Computer Science, University College London, London, UK
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | | |
Collapse
|
17
|
Oto OA, Kardeş S, Guller N, Safak S, Dirim AB, Başhan Y, Demir E, Artan AS, Yazıcı H, Turkmen A. Impact of the COVID-19 pandemic on interest in renal diseases. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:711-718. [PMID: 34341920 PMCID: PMC8328136 DOI: 10.1007/s11356-021-15675-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 07/23/2021] [Indexed: 06/13/2023]
Abstract
There is an information gap about the public's interest in nephrological diseases in the COVID-19 era. The objective was to identify public interest in kidney diseases during the pandemic. In this infodemiology study, Google Trends was queried for a total of 50 search queries corresponding to a broad spectrum of nephrological diseases and the term "nephrologist." Two time intervals of 2020 (March 15-July 4 and July 5-October 31) were compared to similar time intervals of 2016-2019 for providing information on interest in different phases of the pandemic. Compared to the prior 4 years, analyses showed significant decreases in relative search volume (RSV) in the majority (76%) of search queries on March 15-July 4, 2020 period. However, RSV of the majority of search queries (≈70%) on July 5-October 31, 2020 period was not significantly different from similar periods of the previous 4 years, with an increase in search terms of amyloidosis, kidney biopsy, hematuria, chronic kidney disease, hypertension, nephrolithiasis, acute kidney injury, and Fabry disease. During the early pandemic, there have been significant decreases in search volumes for many nephrological diseases. However, this trend reversed in the period from July 5 to October 31, 2020, implying the increased need for information on kidney diseases. The results of this study enable us to understand how COVID-19 impacted the interest in kidney diseases and demands/needs for kidney diseases by the general public during the pandemic.
Collapse
Affiliation(s)
- Ozgur Akin Oto
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey.
| | - Sinan Kardeş
- Department of Medical Ecology and Hydroclimatology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Nurane Guller
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Seda Safak
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Ahmet Burak Dirim
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Yağmur Başhan
- Department of Nephrology, Haseki Education Research Hospital, Istanbul, Turkey
| | - Erol Demir
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Ayse Serra Artan
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Halil Yazıcı
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Aydın Turkmen
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| |
Collapse
|
18
|
Abstract
Syndromic surveillance systems monitor disease indicators to detect emergence of diseases and track their progression. Here, we report on a rapidly deployed active syndromic surveillance system for tracking COVID-19 in Israel. The system was a novel combination of active and passive components: Ads were shown to people searching for COVID-19 symptoms on the Google search engine. Those who clicked on the ads were referred to a chat bot which helped them decide whether they needed urgent medical care. Through its conversion optimization mechanism, the ad system was guided to focus on those people who required such care. Over 6 months, the ads were shown approximately 214,000 times and clicked on 12,000 times, and 722 people were informed they needed urgent care. Click rates on ads and the fraction of people deemed to require urgent care were correlated with the hospitalization rate ([Formula: see text] and [Formula: see text], respectively) with a lead time of 9 days. Males and younger people were more likely to use the system, and younger people were more likely to be determined to require urgent care (slope: [Formula: see text], [Formula: see text]). Thus, the system can assist in predicting case numbers and hospital load at a significant lead time and, simultaneously, help people determine if they need medical care.
Collapse
Affiliation(s)
- Elad Yom-Tov
- Microsoft Research, Alan Turing 3, Hertzliya, 4672415, Israel.
- Faculty of Industrial Engineering and Management, Technion, Haifa, 3200000, Israel.
| |
Collapse
|
19
|
Cai O, Sousa-Pinto B. United States Influenza Search Patterns Since the Emergence of COVID-19: Infodemiology Study. JMIR Public Health Surveill 2021; 8:e32364. [PMID: 34878996 PMCID: PMC8896565 DOI: 10.2196/32364] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Revised: 10/30/2021] [Accepted: 11/30/2021] [Indexed: 12/11/2022] Open
Abstract
Background The emergence and media coverage of COVID-19 may have affected influenza search patterns, possibly affecting influenza surveillance results using Google Trends. Objective We aimed to investigate if the emergence of COVID-19 was associated with modifications in influenza search patterns in the United States. Methods We retrieved US Google Trends data (relative number of searches for specified terms) for the topics influenza, Coronavirus disease 2019, and symptoms shared between influenza and COVID-19. We calculated the correlations between influenza and COVID-19 search data for a 1-year period after the first COVID-19 diagnosis in the United States (January 21, 2020 to January 20, 2021). We constructed a seasonal autoregressive integrated moving average model and compared predicted search volumes, using the 4 previous years, with Google Trends relative search volume data. We built a similar model for shared symptoms data. We also assessed correlations for the past 5 years between Google Trends influenza data, US Centers for Diseases Control and Prevention influenza-like illness data, and influenza media coverage data. Results We observed a nonsignificant weak correlation (ρ= –0.171; P=0.23) between COVID-19 and influenza Google Trends data. Influenza search volumes for 2020-2021 distinctly deviated from values predicted by seasonal autoregressive integrated moving average models—for 6 weeks within the first 13 weeks after the first COVID-19 infection was confirmed in the United States, the observed volume of searches was higher than the upper bound of 95% confidence intervals for predicted values. Similar results were observed for shared symptoms with influenza and COVID-19 data. The correlation between Google Trends influenza data and CDC influenza-like-illness data decreased after the emergence of COVID-19 (2020-2021: ρ=0.643; 2019-2020: ρ=0.902), while the correlation between Google Trends influenza data and influenza media coverage volume remained stable (2020-2021: ρ=0.746; 2019-2020: ρ=0.707). Conclusions Relevant differences were observed between predicted and observed influenza Google Trends data the year after the onset of the COVID-19 pandemic in the United States. Such differences are possibly due to media coverage, suggesting limitations to the use of Google Trends as a flu surveillance tool.
Collapse
Affiliation(s)
- Owen Cai
- Shadow Creek High School, Pearland, US
| | - Bernardo Sousa-Pinto
- MEDCIDS - Department of Community Medicine, Information and Health Decision Sciences, Faculty of Medicine, University of Porto, Rua Plácido Costa s/n, Porto, PT.,CINTESIS - Center for Health Technologies and Services Research, University of Porto, Porto, PT
| |
Collapse
|
20
|
Benecke J, Benecke C, Ciutan M, Dosius M, Vladescu C, Olsavszky V. Retrospective analysis and time series forecasting with automated machine learning of ascariasis, enterobiasis and cystic echinococcosis in Romania. PLoS Negl Trop Dis 2021; 15:e0009831. [PMID: 34723982 PMCID: PMC8584970 DOI: 10.1371/journal.pntd.0009831] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 11/11/2021] [Accepted: 09/22/2021] [Indexed: 12/04/2022] Open
Abstract
The epidemiology of neglected tropical diseases (NTD) is persistently underprioritized, despite NTD being widespread among the poorest populations and in the least developed countries on earth. This situation necessitates thorough and efficient public health intervention. Romania is at the brink of becoming a developed country. However, this South-Eastern European country appears to be a region that is susceptible to an underestimated burden of parasitic diseases despite recent public health reforms. Moreover, there is an evident lack of new epidemiologic data on NTD after Romania's accession to the European Union (EU) in 2007. Using the national ICD-10 dataset for hospitalized patients in Romania, we generated time series datasets for 2008-2018. The objective was to gain deep understanding of the epidemiological distribution of three selected and highly endemic parasitic diseases, namely, ascariasis, enterobiasis and cystic echinococcosis (CE), during this period and forecast their courses for the ensuing two years. Through descriptive and inferential analysis, we observed a decline in case numbers for all three NTD. Several distributional particularities at regional level emerged. Furthermore, we performed predictions using a novel automated time series (AutoTS) machine learning tool and could interestingly show a stable course for these parasitic NTD. Such predictions can help public health officials and medical organizations to implement targeted disease prevention and control. To our knowledge, this is the first study involving a retrospective analysis of ascariasis, enterobiasis and CE on a nationwide scale in Romania. It is also the first to use AutoTS technology for parasitic NTD.
Collapse
Affiliation(s)
- Johannes Benecke
- Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, University of Heidelberg, and Center of Excellence in Dermatology, Mannheim, Germany
| | - Cornelius Benecke
- Barcelona Institute for Global Health, University of Barcelona, Barcelona, Spain
| | - Marius Ciutan
- National School of Public Health Management and Professional Development, Bucharest, Romania
| | - Mihnea Dosius
- National School of Public Health Management and Professional Development, Bucharest, Romania
| | - Cristian Vladescu
- National School of Public Health Management and Professional Development, Bucharest, Romania
- University Titu Maiorescu, Faculty of Medicine, Bucharest, Romania
| | - Victor Olsavszky
- Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, University of Heidelberg, and Center of Excellence in Dermatology, Mannheim, Germany
| |
Collapse
|
21
|
Lampos V, Mintz J, Qu X. An artificial intelligence approach for selecting effective teacher communication strategies in autism education. NPJ SCIENCE OF LEARNING 2021; 6:25. [PMID: 34471124 PMCID: PMC8410830 DOI: 10.1038/s41539-021-00102-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 07/29/2021] [Indexed: 06/13/2023]
Abstract
Effective inclusive education is key in promoting the long-term outcomes of children with autism spectrum conditions (ASC). However, no concrete consensus exists to guide teacher-student interactions in the classroom. In this work, we explore the potential of artificial intelligence as an approach in autism education to assist teachers in effective practice in developing social and educational outcomes for children with ASC. We form a protocol to systematically capture such interactions, and conduct a statistical analysis to uncover basic patterns in the collected observations, including the longer-term effect of specific teacher communication strategies on student response. In addition, we deploy machine learning techniques to predict student response given the form of communication used by teachers under specific classroom conditions and in relation to specified student attributes. Our analysis, drawn on a sample of 5460 coded interactions between teachers and seven students, sheds light on the varying effectiveness of different communication strategies and demonstrates the potential of this approach in making a contribution to autism education.
Collapse
Affiliation(s)
- Vasileios Lampos
- Department of Computer Science, University College London, London, UK.
| | - Joseph Mintz
- Institute of Education, University College London, London, UK.
| | - Xiao Qu
- Institute of Education, University College London, London, UK
| |
Collapse
|
22
|
Global research interest regarding silver diamine fluoride in dentistry: A bibliometric analysis. J Dent 2021; 113:103778. [PMID: 34391874 DOI: 10.1016/j.jdent.2021.103778] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 08/03/2021] [Accepted: 08/07/2021] [Indexed: 11/21/2022] Open
Abstract
OBJECTIVE This study aimed to investigate the global research interest regarding silver diamine fluoride (SDF) in dentistry using a bibliometric approach. METHODS A literature search was conducted in the Web of Science Core Collection database to identify studies related to SDF. Bibliometric data from the selected publications were exported and analysed using the Bibliometrix Biblioshiny R-package software. The type of research and main contents of the publications were summarised. One-way analysis of variance was used to detect the differences in the citation counts of the publications with various types of research. In addition, Google Trends was used to investigate the popularity of the search term "silver diamine fluoride". RESULTS A total of 259 publications were included and analyzed. The annual scientific production of SDF studies increased significantly per year in the past five years, and it mainly concerned dental caries. The three main types of research were laboratory/animal study (n = 114, 44%), review/guideline (n = 56, 22%), and clinical trial (n = 44, 17%). The citation count related to the type of research (p < 0.01). The citation count of clinical trials was significantly higher than that of laboratory/animal studies (p < 0.05). As quantified via data from Google Trends, the search popularity of "silver diamine fluoride" also increased significantly. CONCLUSION Based on the results of bibliometric analysis, global research interest regarding SDF has rapidly increased in recent years. CLINICAL SIGNIFICANCE This paper presents an overview of scientific evidence and impact of SDF use in dentistry. SDF attracts a growing interest globally and there has been a steady increase in scientific research into its use in dental practice.
Collapse
|
23
|
Hswen Y, Yom-Tov E. Analysis of a Vaping-Associated Lung Injury Outbreak through Participatory Surveillance and Archival Internet Data. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18158203. [PMID: 34360495 PMCID: PMC8346109 DOI: 10.3390/ijerph18158203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 07/28/2021] [Accepted: 07/30/2021] [Indexed: 11/22/2022]
Abstract
The US Centers for Disease Control and Prevention alerted of a suspected outbreak of lung illness associated with using E-cigarette products in September 2019. At the time that the CDC published its alert little was known about the causes of the outbreak or who was at risk for it. Here we provide insights into the outbreak through analysis of passive reporting and participatory surveillance. We collected data about vaping habits and associated adverse reactions from four data sources pertaining to people in the USA: A participatory surveillance platform (YouVape), Reddit, Google Trends, and Bing. Data were analyzed to identify vaping behaviors and reported adverse events. These were correlated among sources and with prior reports. Data was obtained from 720 YouVape users, 4331 Reddit users, and over 1 million Bing users. Large geographic variation was observed across vaping products. Significant correlation was found among the data sources in reported adverse reactions. Models of participatory surveillance data found specific product and adverse reaction associations. Specifically, cannabidiol was found to be associated with fever, while tetrahydrocannabinol was found to be correlated with diarrhea. Our results demonstrate that utilization of different, complementary, online data sources provide a holistic view of vaping associated lung injury while augmenting traditional data sources.
Collapse
Affiliation(s)
- Yulin Hswen
- Department of Epidemiology and Biostatistics, University of California at San Francisco, San Francisco, CA 94158, USA;
- Bakar Computational Health Sciences Institute, University of California at San Francisco, San Francisco, CA 94143, USA
- Innovation Program, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Elad Yom-Tov
- Microsoft Research Israel, 3 Alan Turing Str., Herzeliya 4672415, Israel
- Faculty of Industrial Engineering and Management, Technion, Haifa 3200000, Israel
- Correspondence:
| |
Collapse
|
24
|
Jang B, Kim I, Kim JW. Effective Training Data Extraction Method to Improve Influenza Outbreak Prediction from Online News Articles: Deep Learning Model Study. JMIR Med Inform 2021; 9:e23305. [PMID: 34032577 PMCID: PMC8188311 DOI: 10.2196/23305] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 10/13/2020] [Accepted: 04/01/2021] [Indexed: 11/13/2022] Open
Abstract
Background Each year, influenza affects 3 to 5 million people and causes 290,000 to 650,000 fatalities worldwide. To reduce the fatalities caused by influenza, several countries have established influenza surveillance systems to collect early warning data. However, proper and timely warnings are hindered by a 1- to 2-week delay between the actual disease outbreaks and the publication of surveillance data. To address the issue, novel methods for influenza surveillance and prediction using real-time internet data (such as search queries, microblogging, and news) have been proposed. Some of the currently popular approaches extract online data and use machine learning to predict influenza occurrences in a classification mode. However, many of these methods extract training data subjectively, and it is difficult to capture the latent characteristics of the data correctly. There is a critical need to devise new approaches that focus on extracting training data by reflecting the latent characteristics of the data. Objective In this paper, we propose an effective method to extract training data in a manner that reflects the hidden features and improves the performance by filtering and selecting only the keywords related to influenza before the prediction. Methods Although word embedding provides a distributed representation of words by encoding the hidden relationships between various tokens, we enhanced the word embeddings by selecting keywords related to the influenza outbreak and sorting the extracted keywords using the Pearson correlation coefficient in order to solely keep the tokens with high correlation with the actual influenza outbreak. The keyword extraction process was followed by a predictive model based on long short-term memory that predicts the influenza outbreak. To assess the performance of the proposed predictive model, we used and compared a variety of word embedding techniques. Results Word embedding without our proposed sorting process showed 0.8705 prediction accuracy when 50.2 keywords were selected on average. Conversely, word embedding using our proposed sorting process showed 0.8868 prediction accuracy and an improvement in prediction accuracy of 12.6%, although smaller amounts of training data were selected, with only 20.6 keywords on average. Conclusions The sorting stage empowers the embedding process, which improves the feature extraction process because it acts as a knowledge base for the prediction component. The model outperformed other current approaches that use flat extraction before prediction.
Collapse
Affiliation(s)
- Beakcheol Jang
- Graduate School of Information, Yonsei University, Seoul, Republic of Korea
| | - Inhwan Kim
- Graduate School of Information, Yonsei University, Seoul, Republic of Korea
| | - Jong Wook Kim
- Department of Computer Science, Sangmyung Univerisity, Seoul, Republic of Korea
| |
Collapse
|
25
|
Poirier C, Hswen Y, Bouzillé G, Cuggia M, Lavenu A, Brownstein JS, Brewer T, Santillana M. Influenza forecasting for French regions combining EHR, web and climatic data sources with a machine learning ensemble approach. PLoS One 2021; 16:e0250890. [PMID: 34010293 PMCID: PMC8133501 DOI: 10.1371/journal.pone.0250890] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 04/16/2021] [Indexed: 11/25/2022] Open
Abstract
Effective and timely disease surveillance systems have the potential to help public health officials design interventions to mitigate the effects of disease outbreaks. Currently, healthcare-based disease monitoring systems in France offer influenza activity information that lags real-time by one to three weeks. This temporal data gap introduces uncertainty that prevents public health officials from having a timely perspective on the population-level disease activity. Here, we present a machine-learning modeling approach that produces real-time estimates and short-term forecasts of influenza activity for the twelve continental regions of France by leveraging multiple disparate data sources that include, Google search activity, real-time and local weather information, flu-related Twitter micro-blogs, electronic health records data, and historical disease activity synchronicities across regions. Our results show that all data sources contribute to improving influenza surveillance and that machine-learning ensembles that combine all data sources lead to accurate and timely predictions.
Collapse
Affiliation(s)
- Canelle Poirier
- INSERM, U1099, Rennes, France
- Université de Rennes 1, LTSI, Rennes, France
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States of America
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, United States of America
- * E-mail: (CP); (MS)
| | - Yulin Hswen
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA, United States of America
- Innovation Program, Boston Children’s Hospital, Boston, MA, United States of America
| | - Guillaume Bouzillé
- INSERM, U1099, Rennes, France
- Université de Rennes 1, LTSI, Rennes, France
- CHU Rennes, Centre de Données Cliniques, Rennes, France
| | - Marc Cuggia
- INSERM, U1099, Rennes, France
- Université de Rennes 1, LTSI, Rennes, France
- CHU Rennes, Centre de Données Cliniques, Rennes, France
| | - Audrey Lavenu
- Université de Rennes 1, Faculté de médecine, Rennes, France
- INSERM CIC 1414, Université de Rennes 1, Rennes, France
- IRMAR, Institut de Recherche Mathématique de Rennes, Rennes, France
| | - John S. Brownstein
- Innovation Program, Boston Children’s Hospital, Boston, MA, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States of America
| | - Thomas Brewer
- Innovation Program, Boston Children’s Hospital, Boston, MA, United States of America
| | - Mauricio Santillana
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States of America
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, United States of America
- * E-mail: (CP); (MS)
| |
Collapse
|
26
|
Jun SP, Yoo HS, Lee JS. The impact of the pandemic declaration on public awareness and behavior: Focusing on COVID-19 google searches. TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE 2021; 166:120592. [PMID: 33776154 PMCID: PMC7978359 DOI: 10.1016/j.techfore.2021.120592] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2020] [Revised: 01/06/2021] [Accepted: 01/07/2021] [Indexed: 05/28/2023]
Abstract
The unprecedented outbreaks of epidemics such as the coronavirus has caused major socio-economic changes. To analyze public risk awareness and behavior in response to the outbreak of epidemic diseases, this study focuses on RSV (Relative Search Volume) provided by Google Trends. This study uses the social big data provided by Google RSV to investigate how the WHO's pandemic declaration affected public awareness and behavior. 37 OECD countries were analyzed and clustered according to the degree of reaction to the declaration, and the United States, France and Germany were selected for comparative study. The results of this study statistically confirmed that the pandemic declaration increased public awareness and had the effect of increasing searches for information on COVID-19 by more than 20%. In addition, this rapid rise in RSV also reflected interest in the COVID-19 test and had the effect of inducing individuals to be tested, which helped identify new cases. The significance of this study is that it provided the theoretical foundation for using RSV and its implications to understand and strategically utilize public awareness and behavior in situations where the WHO and governments must launch policies in response to the outbreak of new infectious diseases such as COVID-19.
Collapse
Affiliation(s)
- Seung-Pyo Jun
- Data Analysis Platform Center, Korea Institute of Science and Technology Information and Science & Technology Management Policy, University of Science & Technology (UST), 66, Hoegi-ro, Dongdaemun-gu, Seoul 130-741, Korea
| | - Hyoung Sun Yoo
- Korea Institute of Science and Technology Information and Science & Technology Management Policy, University of Science & Technology (UST), 66, Hoegi-ro, Dongdaemun-gu, Seoul 130-741, Korea
| | - Jae-Seong Lee
- Data Analysis Platform Center, Korea Institute of Science and Technology Information and Science & Technology Management Policy, University of Science & Technology (UST), 66, Hoegi-ro, Dongdaemun-gu, Seoul 130-741, Korea
| |
Collapse
|
27
|
Lampos V, Majumder MS, Yom-Tov E, Edelstein M, Moura S, Hamada Y, Rangaka MX, McKendry RA, Cox IJ. Tracking COVID-19 using online search. NPJ Digit Med 2021; 4:17. [PMID: 33558607 PMCID: PMC7870878 DOI: 10.1038/s41746-021-00384-w] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 12/24/2020] [Indexed: 12/30/2022] Open
Abstract
Previous research has demonstrated that various properties of infectious diseases can be inferred from online search behaviour. In this work we use time series of online search query frequencies to gain insights about the prevalence of COVID-19 in multiple countries. We first develop unsupervised modelling techniques based on associated symptom categories identified by the United Kingdom’s National Health Service and Public Health England. We then attempt to minimise an expected bias in these signals caused by public interest—as opposed to infections—using the proportion of news media coverage devoted to COVID-19 as a proxy indicator. Our analysis indicates that models based on online searches precede the reported confirmed cases and deaths by 16.7 (10.2–23.2) and 22.1 (17.4–26.9) days, respectively. We also investigate transfer learning techniques for mapping supervised models from countries where the spread of the disease has progressed extensively to countries that are in earlier phases of their respective epidemic curves. Furthermore, we compare time series of online search activity against confirmed COVID-19 cases or deaths jointly across multiple countries, uncovering interesting querying patterns, including the finding that rarer symptoms are better predictors than common ones. Finally, we show that web searches improve the short-term forecasting accuracy of autoregressive models for COVID-19 deaths. Our work provides evidence that online search data can be used to develop complementary public health surveillance methods to help inform the COVID-19 response in conjunction with more established approaches.
Collapse
Affiliation(s)
- Vasileios Lampos
- Department of Computer Science, University College London, London, UK.
| | - Maimuna S Majumder
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.,Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | | | - Michael Edelstein
- National Infection Service, Public Health England, London, UK.,Department of Population Health, Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Simon Moura
- Department of Computer Science, University College London, London, UK
| | - Yohhei Hamada
- Institute for Global Health, University College London, London, UK
| | - Molebogeng X Rangaka
- Institute for Global Health, University College London, London, UK.,Division of Epidemiology and Biostatistics, University of Cape Town, Cape Town, South Africa
| | - Rachel A McKendry
- London Centre for Nanotechnology, University College London, London, UK.,Division of Medicine, University College London, London, UK
| | - Ingemar J Cox
- Department of Computer Science, University College London, London, UK.,Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
28
|
Kardeş S, Kuzu AS, Raiker R, Pakhchanian H, Karagülle M. Public interest in rheumatic diseases and rheumatologist in the United States during the COVID-19 pandemic: evidence from Google Trends. Rheumatol Int 2021; 41:329-334. [PMID: 33070255 PMCID: PMC7568841 DOI: 10.1007/s00296-020-04728-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 10/08/2020] [Indexed: 12/13/2022]
Abstract
To evaluate the public interest in rheumatic diseases during the coronavirus disease 2019 (COVID-19) pandemic. Google Trends was queried to analyze search trends in the United States for numerous rheumatic diseases and also the interest in a rheumatologist. Three 8-week periods in 2020 ((March 15-May 9), (May 10-July 4), and (July 5-August 29)) were compared to similar periods of the prior 4 years (2016-2019). Compared to a similar time period between 2016 and 2019, a significant decrease was found in the relative search volume for more than half of the search terms during the initial March 15-May 9, 2020 period. However, this trend appeared to reverse during the July 5-August 29, 2020 period where the relative volume for nearly half of the search terms were not statistically significant compared to similar periods of the prior 4 years. In addition, this period showed a significant increase in relative volume for the terms: Axial spondyloarthritis, ankylosing spondylitis, psoriatic arthritis, rheumatoid arthritis, Sjögren's syndrome, antiphospholipid syndrome, scleroderma, Kawasaki disease, Anti-Neutrophil Cytoplasmic Antibody (ANCA)-associated vasculitis, and rheumatologist. There was a significant decrease in relative search volume for many rheumatic diseases between March 15 and May 9, 2020 when compared to similar periods during the prior 4 years. However, the trends reversed after the initial period ended. There was an increase in relative search for the term "rheumatologist" between July and August 2020 suggesting the need for rheumatologists during the COVID-19 pandemic. Policymakers and healthcare providers should address the informational demands on rheumatic diseases and needs for rheumatologists by the general public during pandemics like COVID-19.
Collapse
Affiliation(s)
- Sinan Kardeş
- Department of Medical Ecology and Hydroclimatology, Istanbul Faculty of Medicine, Istanbul University, Capa-Fatih, 34093 Istanbul, Turkey
| | - Ali Suat Kuzu
- Department of Medical Ecology and Hydroclimatology, Istanbul Faculty of Medicine, Istanbul University, Capa-Fatih, 34093 Istanbul, Turkey
| | - Rahul Raiker
- West Virginia University School of Medicine, Morgantown, WV USA
| | - Haig Pakhchanian
- George Washington University School of Medicine & Health Science, Washington, DC USA
| | - Mine Karagülle
- Department of Medical Ecology and Hydroclimatology, Istanbul Faculty of Medicine, Istanbul University, Capa-Fatih, 34093 Istanbul, Turkey
| |
Collapse
|
29
|
Mao K, Zhang H, Yang Z. An integrated biosensor system with mobile health and wastewater-based epidemiology (iBMW) for COVID-19 pandemic. Biosens Bioelectron 2020; 169:112617. [PMID: 32998066 PMCID: PMC7492834 DOI: 10.1016/j.bios.2020.112617] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 09/02/2020] [Accepted: 09/14/2020] [Indexed: 12/20/2022]
Abstract
The outbreak of coronavirus disease (COVID-19) has caused a significant public health challenge worldwide. A lack of effective methods for screening potential patients, rapidly diagnosing suspected cases, and accurately monitoring of the epidemic in real time to prevent the rapid spread of COVID-19 raises significant difficulties in mitigating the epidemic in many countries. As effective point-of-care diagnosis tools, simple, low-cost and rapid sensors have the potential to greatly accelerate the screening and diagnosis of suspected patients to improve their treatment and care. In particular, there is evidence that multiple pathogens have been detected in sewage, including SARS-CoV-2, providing significant opportunities for the development of advanced sensors for wastewater-based epidemiology that provide an early warning of the pandemic within the population. Sensors could be used to screen potential carriers, provide real-time monitoring and control of the epidemic, and even support targeted drug screening and delivery within the integration of emerging mobile health (mHealth) technology. In this communication, we discuss the feasibility of an integrated point-of-care biosensor system with mobile health for wastewater-based epidemiology (iBMW) for early warning of COVID-19, screening and diagnosis of potential infectors, and improving health care and public health. The iBMW will provide an effective approach to prevent, evaluate and intervene in a fast, affordable and reliable way, thus enabling real-time guidance for the government in providing effective intervention and evaluating the effectiveness of intervention.
Collapse
Affiliation(s)
- Kang Mao
- State Key Laboratory of Environmental Geochemistry, Institute of Geochemistry, Chinese Academy of Sciences, Guiyang, 550081, China
| | - Hua Zhang
- State Key Laboratory of Environmental Geochemistry, Institute of Geochemistry, Chinese Academy of Sciences, Guiyang, 550081, China.
| | - Zhugen Yang
- Cranfield Water Science Institute, Cranfield University, Cranfield, MK43 0AL, United Kingdom.
| |
Collapse
|
30
|
Johnson AK, Bhaumik R, Tabidze I, Mehta SD. Nowcasting Sexually Transmitted Infections in Chicago: Predictive Modeling and Evaluation Study Using Google Trends. JMIR Public Health Surveill 2020; 6:e20588. [PMID: 33151162 PMCID: PMC7677015 DOI: 10.2196/20588] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 08/26/2020] [Accepted: 09/01/2020] [Indexed: 01/26/2023] Open
Abstract
Background Sexually transmitted infections (STIs) pose a significant public health challenge in the United States. Traditional surveillance systems are adversely affected by data quality issues, underreporting of cases, and reporting delays, resulting in missed prevention opportunities to respond to trends in disease prevalence. Search engine data can potentially facilitate an efficient and economical enhancement to surveillance reporting systems established for STIs. Objective We aimed to develop and train a predictive model using reported STI case data from Chicago, Illinois, and to investigate the model’s predictive capacity, timeliness, and ability to target interventions to subpopulations using Google Trends data. Methods Deidentified STI case data for chlamydia, gonorrhea, and primary and secondary syphilis from 2011-2017 were obtained from the Chicago Department of Public Health. The data set included race/ethnicity, age, and birth sex. Google Correlate was used to identify the top 100 correlated search terms with “STD symptoms,” and an autocrawler was established using Google Health Application Programming Interface to collect the search volume for each term. Elastic net regression was used to evaluate prediction accuracy, and cross-correlation analysis was used to identify timeliness of prediction. Subgroup elastic net regression analysis was performed for race, sex, and age. Results For gonorrhea and chlamydia, actual and predicted STI values correlated moderately in 2011 (chlamydia: r=0.65; gonorrhea: r=0.72) but correlated highly (chlamydia: r=0.90; gonorrhea: r=0.94) from 2012 to 2017. However, for primary and secondary syphilis, the high correlation was observed only for 2012 (r=0.79), 2013 (r=0.77), 2016 (0.80), and 2017 (r=0.84), with 2011, 2014, and 2015 showing moderate correlations (r=0.55-0.70). Model performance was the most accurate (highest correlation and lowest mean absolute error) for gonorrhea. Subgroup analyses improved model fit across disease and year. Regression models using search terms selected from the cross-correlation analysis improved the prediction accuracy and timeliness across diseases and years. Conclusions Integrating nowcasting with Google Trends in surveillance activities can potentially enhance the prediction and timeliness of outbreak detection and response as well as target interventions to subpopulations. Future studies should prospectively examine the utility of Google Trends applied to STI surveillance and response.
Collapse
Affiliation(s)
- Amy Kristen Johnson
- Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, United States.,Northwestern University, Chicago, IL, United States
| | - Runa Bhaumik
- School of Public Health, University of Illinois at Chicago, Chicago, IL, United States
| | - Irina Tabidze
- Chicago Department of Public Health, Chicago, IL, United States
| | - Supriya D Mehta
- School of Public Health, University of Illinois at Chicago, Chicago, IL, United States
| |
Collapse
|
31
|
Hisada S, Murayama T, Tsubouchi K, Fujita S, Yada S, Wakamiya S, Aramaki E. Surveillance of early stage COVID-19 clusters using search query logs and mobile device-based location information. Sci Rep 2020; 10:18680. [PMID: 33122686 PMCID: PMC7596075 DOI: 10.1038/s41598-020-75771-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Accepted: 10/01/2020] [Indexed: 12/18/2022] Open
Abstract
Two clusters of the coronavirus disease 2019 (COVID-19) were confirmed in Hokkaido, Japan, in February 2020. To identify these clusters, this study employed web search query logs of multiple devices and user location information from location-aware mobile devices. We anonymously identified users who used a web search engine (i.e., Yahoo! JAPAN) to search for COVID-19 or its symptoms. We regarded them as web searchers who were suspicious of their own COVID-19 infection (WSSCI). We extracted the location of WSSCI via a mobile operating system application and compared the spatio-temporal distribution of WSSCI with the actual location of the two known clusters. In the early stage of cluster development, we confirmed several WSSCI. Our approach was accurate in this stage and became biased after a public announcement of the cluster development. When other cluster-related resources, such as detailed population statistics, are not available, the proposed metric can capture hints of emerging clusters.
Collapse
Affiliation(s)
- Shohei Hisada
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Taichi Murayama
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | | | | | - Shuntaro Yada
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Shoko Wakamiya
- Nara Institute of Science and Technology (NAIST), Nara, Japan
| | - Eiji Aramaki
- Nara Institute of Science and Technology (NAIST), Nara, Japan.
| |
Collapse
|
32
|
Wu J, Wang J, Nicholas S, Maitland E, Fan Q. Application of Big Data Technology for COVID-19 Prevention and Control in China: Lessons and Recommendations. J Med Internet Res 2020; 22:e21980. [PMID: 33001836 PMCID: PMC7561444 DOI: 10.2196/21980] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 07/28/2020] [Accepted: 09/14/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND In the prevention and control of infectious diseases, previous research on the application of big data technology has mainly focused on the early warning and early monitoring of infectious diseases. Although the application of big data technology for COVID-19 warning and monitoring remain important tasks, prevention of the disease's rapid spread and reduction of its impact on society are currently the most pressing challenges for the application of big data technology during the COVID-19 pandemic. After the outbreak of COVID-19 in Wuhan, the Chinese government and nongovernmental organizations actively used big data technology to prevent, contain, and control the spread of COVID-19. OBJECTIVE The aim of this study is to discuss the application of big data technology to prevent, contain, and control COVID-19 in China; draw lessons; and make recommendations. METHODS We discuss the data collection methods and key data information that existed in China before the outbreak of COVID-19 and how these data contributed to the prevention and control of COVID-19. Next, we discuss China's new data collection methods and new information assembled after the outbreak of COVID-19. Based on the data and information collected in China, we analyzed the application of big data technology from the perspectives of data sources, data application logic, data application level, and application results. In addition, we analyzed the issues, challenges, and responses encountered by China in the application of big data technology from four perspectives: data access, data use, data sharing, and data protection. Suggestions for improvements are made for data collection, data circulation, data innovation, and data security to help understand China's response to the epidemic and to provide lessons for other countries' prevention and control of COVID-19. RESULTS In the process of the prevention and control of COVID-19 in China, big data technology has played an important role in personal tracking, surveillance and early warning, tracking of the virus's sources, drug screening, medical treatment, resource allocation, and production recovery. The data used included location and travel data, medical and health data, news media data, government data, online consumption data, data collected by intelligent equipment, and epidemic prevention data. We identified a number of big data problems including low efficiency of data collection, difficulty in guaranteeing data quality, low efficiency of data use, lack of timely data sharing, and data privacy protection issues. To address these problems, we suggest unified data collection standards, innovative use of data, accelerated exchange and circulation of data, and a detailed and rigorous data protection system. CONCLUSIONS China has used big data technology to prevent and control COVID-19 in a timely manner. To prevent and control infectious diseases, countries must collect, clean, and integrate data from a wide range of sources; use big data technology to analyze a wide range of big data; create platforms for data analyses and sharing; and address privacy issues in the collection and use of big data.
Collapse
Affiliation(s)
- Jun Wu
- Dong Fureng Institute of Economic and Social Development, Wuhan University, Wuhan, China
| | - Jian Wang
- Dong Fureng Institute of Economic and Social Development, Wuhan University, Beijing, China
| | - Stephen Nicholas
- Australian National Institute of Management and Commerce, Sydney, Australia
- Newcastle Business School, University of Newcastle, Newcastle, Australia
| | - Elizabeth Maitland
- School of Management, University of Liverpool, Liverpool, United Kingdom
| | - Qiuyan Fan
- Dong Fureng Institute of Economic and Social Development, Wuhan University, Wuhan, China
| |
Collapse
|
33
|
Budd J, Miller BS, Manning EM, Lampos V, Zhuang M, Edelstein M, Rees G, Emery VC, Stevens MM, Keegan N, Short MJ, Pillay D, Manley E, Cox IJ, Heymann D, Johnson AM, McKendry RA. Digital technologies in the public-health response to COVID-19. Nat Med 2020; 26:1183-1192. [DOI: 10.1038/s41591-020-1011-4] [Citation(s) in RCA: 485] [Impact Index Per Article: 97.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Accepted: 07/02/2020] [Indexed: 12/23/2022]
|
34
|
Time Series Analysis and Forecasting with Automated Machine Learning on a National ICD-10 Database. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17144979. [PMID: 32664331 PMCID: PMC7400312 DOI: 10.3390/ijerph17144979] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 06/29/2020] [Accepted: 07/07/2020] [Indexed: 12/22/2022]
Abstract
The application of machine learning (ML) for use in generating insights and making predictions on new records continues to expand within the medical community. Despite this progress to date, the application of time series analysis has remained underexplored due to complexity of the underlying techniques. In this study, we have deployed a novel ML, called automated time series (AutoTS) machine learning, to automate data processing and the application of a multitude of models to assess which best forecasts future values. This rapid experimentation allows for and enables the selection of the most accurate model in order to perform time series predictions. By using the nation-wide ICD-10 (International Classification of Diseases, Tenth Revision) dataset of hospitalized patients of Romania, we have generated time series datasets over the period of 2008–2018 and performed highly accurate AutoTS predictions for the ten deadliest diseases. Forecast results for the years 2019 and 2020 were generated on a NUTS 2 (Nomenclature of Territorial Units for Statistics) regional level. This is the first study to our knowledge to perform time series forecasting of multiple diseases at a regional level using automated time series machine learning on a national ICD-10 dataset. The deployment of AutoTS technology can help decision makers in implementing targeted national health policies more efficiently.
Collapse
|
35
|
Caldwell WK, Fairchild G, Del Valle SY. Surveilling Influenza Incidence With Centers for Disease Control and Prevention Web Traffic Data: Demonstration Using a Novel Dataset. J Med Internet Res 2020; 22:e14337. [PMID: 32437327 PMCID: PMC7367534 DOI: 10.2196/14337] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 01/29/2020] [Accepted: 03/22/2020] [Indexed: 11/23/2022] Open
Abstract
Background Influenza epidemics result in a public health and economic burden worldwide. Traditional surveillance techniques, which rely on doctor visits, provide data with a delay of 1 to 2 weeks. A means of obtaining real-time data and forecasting future outbreaks is desirable to provide more timely responses to influenza epidemics. Objective This study aimed to present the first implementation of a novel dataset by demonstrating its ability to supplement traditional disease surveillance at multiple spatial resolutions. Methods We used internet traffic data from the Centers for Disease Control and Prevention (CDC) website to determine the potential usability of this data source. We tested the traffic generated by 10 influenza-related pages in 8 states and 9 census divisions within the United States and compared it against clinical surveillance data. Results Our results yielded an r2 value of 0.955 in the most successful case, promising results for some cases, and unsuccessful results for other cases. In the interest of scientific transparency to further the understanding of when internet data streams are an appropriate supplemental data source, we also included negative results (ie, unsuccessful models). Models that focused on a single influenza season were more successful than those that attempted to model multiple influenza seasons. Geographic resolution appeared to play a key role, with national and regional models being more successful, overall, than models at the state level. Conclusions These results demonstrate that internet data may be able to complement traditional influenza surveillance in some cases but not in others. Specifically, our results show that the CDC website traffic may inform national- and division-level models but not models for each individual state. In addition, our results show better agreement when the data were broken up by seasons instead of aggregated over several years. We anticipate that this work will lead to more complex nowcasting and forecasting models using this data stream.
Collapse
Affiliation(s)
- Wendy K Caldwell
- X Computational Physics Division, Los Alamos National Laboratory, Los Alamos, NM, United States.,School of Mathematical and Statistical Sciences, Arizona State University, Tempe, AZ, United States
| | - Geoffrey Fairchild
- Analytics, Intelligence, and Technology Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| | - Sara Y Del Valle
- Analytics, Intelligence, and Technology Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| |
Collapse
|
36
|
Murayama T, Shimizu N, Fujita S, Wakamiya S, Aramaki E. Robust two-stage influenza prediction model considering regular and irregular trends. PLoS One 2020; 15:e0233126. [PMID: 32437380 PMCID: PMC7241782 DOI: 10.1371/journal.pone.0233126] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 04/28/2020] [Indexed: 11/18/2022] Open
Abstract
Influenza causes numerous deaths worldwide every year. Predicting the number of influenza patients is an important task for medical institutions. Two types of data regarding influenza-like illnesses (ILIs) are often used for flu prediction: (1) historical data and (2) user generated content (UGC) data on the web such as search queries and tweets. Historical data have an advantage against the normal state but show disadvantages against irregular phenomena. In contrast, UGC data are advantageous for irregular phenomena. So far, no effective model providing the benefits of both types of data has been devised. This study proposes a novel model, designated the two-stage model, which combines both historical and UGC data. The basic idea is, first, basic regular trends are estimated using the historical data-based model, and then, irregular trends are predicted by the UGC data-based model. Our approach is practically useful because we can train models separately. Thus, if a UGC provider changes the service, our model could produce better performance because the first part of the model is still stable. Experiments on the US and Japan datasets demonstrated the basic feasibility of the proposed approach. In the dropout (pseudo-noise) test that assumes a UGC service would change, the proposed method also showed robustness against outliers. The proposed model is suitable for prediction of seasonal flu.
Collapse
Affiliation(s)
- Taichi Murayama
- Nara Institute of Science and Technology (NAIST), Ikoma-city, Japan
| | | | | | - Shoko Wakamiya
- Nara Institute of Science and Technology (NAIST), Ikoma-city, Japan
| | - Eiji Aramaki
- Nara Institute of Science and Technology (NAIST), Ikoma-city, Japan
| |
Collapse
|
37
|
Ackley SF, Pilewski S, Petrovic VS, Worden L, Murray E, Porco TC. Assessing the utility of a smart thermometer and mobile application as a surveillance tool for influenza and influenza-like illness. Health Informatics J 2020; 26:2148-2158. [PMID: 31969046 DOI: 10.1177/1460458219897152] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Kinsa Inc. sells Food and Drug Administration-cleared smart thermometers, which synchronize with a mobile application, and may aid influenza forecasting efforts. We compare smart thermometer and mobile application data to regional influenza and influenza-like illness surveillance data from the California Department of Public Health. We evaluated the correlation between the regional California surveillance data and smart thermometer data, tested the hypothesis that smart thermometer readings and symptom reports provide regionally specific predictions, and determined whether smart thermometer and mobile application improved disease forecasts. Smart thermometer readings are highly correlated with regional surveillance data, are more predictive of surveillance data for their own region and season than for other times and places, and improve predictions of influenza, but not predictions of influenza-like illness. These results are consistent with the hypothesis that smart thermometer readings and symptom reports reflect underlying disease transmission in California. Data from such cloud-based devices could supplement syndromic influenza surveillance data.
Collapse
Affiliation(s)
| | | | | | - Lee Worden
- University of California, San Francisco, USA
| | | | | |
Collapse
|
38
|
Tideman S, Santillana M, Bickel J, Reis B. Internet search query data improve forecasts of daily emergency department volume. J Am Med Inform Assoc 2019; 26:1574-1583. [PMID: 31730701 PMCID: PMC7647136 DOI: 10.1093/jamia/ocz154] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 07/25/2019] [Accepted: 08/06/2019] [Indexed: 11/15/2022] Open
Abstract
OBJECTIVE Emergency departments (EDs) are increasingly overcrowded. Forecasting patient visit volume is challenging. Reliable and accurate forecasting strategies may help improve resource allocation and mitigate the effects of overcrowding. Patterns related to weather, day of the week, season, and holidays have been previously used to forecast ED visits. Internet search activity has proven useful for predicting disease trends and offers a new opportunity to improve ED visit forecasting. This study tests whether Google search data and relevant statistical methods can improve the accuracy of ED volume forecasting compared with traditional data sources. MATERIALS AND METHODS Seven years of historical daily ED arrivals were collected from Boston Children's Hospital. We used data from the public school calendar, National Oceanic and Atmospheric Administration, and Google Trends. Multiple linear models using LASSO (least absolute shrinkage and selection operator) for variable selection were created. The models were trained on 5 years of data and out-of-sample accuracy was judged using multiple error metrics on the final 2 years. RESULTS All data sources added complementary predictive power. Our baseline day-of-the-week model recorded average percent errors of 10.99%. Autoregressive terms, calendar and weather data reduced errors to 7.71%. Search volume data reduced errors to 7.58% theoretically preventing 4 improperly staffed days. DISCUSSION The predictive power provided by the search volume data may stem from the ability to capture population-level interaction with events, such as winter storms and infectious diseases, that traditional data sources alone miss. CONCLUSIONS This study demonstrates that search volume data can meaningfully improve forecasting of ED visit volume and could help improve quality and reduce cost.
Collapse
Affiliation(s)
- Sam Tideman
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Jonathan Bickel
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Ben Reis
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
- Predictive Medicine Group, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
39
|
Abstract
Google searches are now a popular way for individuals to seek information about the significance of common symptoms and whether they should seek medical assistance. As analysis of search patterns may help understand the demand for medical care, we examined what times over a 24-hour period and on what days of the week people searched Google for information about common symptoms.
We analysed Google searches for symptoms in the United Kingdom during the week from July 30 to August 5, 2018 using Google Trends. We recorded the time points with the highest search volume for 50 common symptoms relative to other searches, and the day of the week with the highest search peak for each particular symptom.
All of the peak searches for the symptoms we examined occurred during the night between 10pm and 8am. The majority 32/50 (64%) occurred between 3am to 6am with 12/50 (24%) between midnight and 3am. Most symptom searches were more common during the week and lowest during the weekend. Typically, searches for a particular symptom peaked at a similar time each night over the week.
Searches for symptoms are significantly more common during night-time hours, and particularly between 3 and 6am. Symptom searches show relatively stable diurnal and weekly patterns.
Google searches for health information are common and individuals regularly search for their specific symptoms before deciding whether to seek medical care.
Searches for common symptoms are significantly more likely to occur, relative to other searches, during the night-time hours and are highest during the working week and lowest at weekends.
The majority of symptom searches show relatively stable diurnal and weekly patterns.
Google searches for health information are common and individuals regularly search for their specific symptoms before deciding whether to seek medical care.
Searches for common symptoms are significantly more likely to occur, relative to other searches, during the night-time hours and are highest during the working week and lowest at weekends.
The majority of symptom searches show relatively stable diurnal and weekly patterns.
Collapse
|
40
|
Kandula S, Pei S, Shaman J. Improved forecasts of influenza-associated hospitalization rates with Google Search Trends. J R Soc Interface 2019; 16:20190080. [PMID: 31185818 PMCID: PMC6597779 DOI: 10.1098/rsif.2019.0080] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Reliable forecasts of influenza-associated hospitalizations during seasonal outbreaks can help health systems better prepare for patient surges. Within the USA, public health surveillance systems collect and distribute near real-time weekly hospitalization rates, a key observational metric that makes real-time forecast of this outcome possible. In this paper, we describe a method to forecast hospitalization rates using a population level transmission model in combination with a data assimilation technique. Using this method, we generated retrospective forecasts of hospitalization rates for five age groups and the overall population during five seasons in the USA and quantified forecast accuracy for both near-term and seasonal targets. Additionally, we describe methods to correct for under-reporting of hospitalization rates (backcast) and to estimate hospitalization rates from publicly available online search trends data (nowcast). Forecasts based on surveillance rates alone were reasonably accurate in predicting peak hospitalization rates (within ± 25% of the actual peak rate, three weeks before peak). The error in predicting rates one to four weeks ahead, remained constant for the duration of the seasons, even during periods of increased influenza incidence. An improvement in forecast quality across all age groups, seasons and targets was observed when backcasts and nowcasts supplemented surveillance data. These results suggest that the model-inference framework can provide reasonably accurate real-time forecasts of influenza hospitalizations; backcasts and nowcasts offer a way to improve system tolerance to observational errors.
Collapse
Affiliation(s)
- Sasikiran Kandula
- Department of Environmental Health Sciences, Columbia University , New York, NY 10032 , USA
| | - Sen Pei
- Department of Environmental Health Sciences, Columbia University , New York, NY 10032 , USA
| | - Jeffrey Shaman
- Department of Environmental Health Sciences, Columbia University , New York, NY 10032 , USA
| |
Collapse
|
41
|
He H, Henderson J, Ho JC. Distributed Tensor Decomposition for Large Scale Health Analytics. PROCEEDINGS OF THE ... INTERNATIONAL WORLD-WIDE WEB CONFERENCE. INTERNATIONAL WWW CONFERENCE 2019; 2019:659-669. [PMID: 31198910 PMCID: PMC6563812 DOI: 10.1145/3308558.3313548] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In the past few decades, there has been rapid growth in quantity and variety of healthcare data. These large sets of data are usually high dimensional (e.g. patients, their diagnoses, and medications to treat their diagnoses) and cannot be adequately represented as matrices. Thus, many existing algorithms can not analyze them. To accommodate these high dimensional data, tensor factorization, which can be viewed as a higher-order extension of methods like PCA, has attracted much attention and emerged as a promising solution. However, tensor factorization is a computationally expensive task, and existing methods developed to factor large tensors are not flexible enough for real-world situations. To address this scaling problem more efficiently, we introduce SGranite, a distributed, scalable, and sparse tensor factorization method fit through stochastic gradient descent. SGranite offers three contributions: (1) Scalability: it employs a block partitioning and parallel processing design and thus scales to large tensors, (2) Accuracy: we show that our method can achieve results faster without sacrificing the quality of the tensor decomposition, and (3) FlexibleConstraints: we show our approach can encompass various kinds of constraints including l2 norm, l1 norm, and logistic regularization. We demonstrate SGranite's capabilities in two real-world use cases. In the first, we use Google searches for flu-like symptoms to characterize and predict influenza patterns. In the second, we use SGranite to extract clinically interesting sets (i.e., phenotypes) of patients from electronic health records. Through these case studies, we show SGranite has the potential to be used to rapidly characterize, predict, and manage a large multimodal datasets, thereby promising a novel, data-driven solution that can benefit very large segments of the population.
Collapse
Affiliation(s)
- Huan He
- Emory University, Atlanta, Georgia
| | | | | |
Collapse
|
42
|
Ning S, Yang S, Kou SC. Accurate regional influenza epidemics tracking using Internet search data. Sci Rep 2019; 9:5238. [PMID: 30918276 PMCID: PMC6437143 DOI: 10.1038/s41598-019-41559-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 03/12/2019] [Indexed: 12/12/2022] Open
Abstract
Accurate, high-resolution tracking of influenza epidemics at the regional level helps public health agencies make informed and proactive decisions, especially in the face of outbreaks. Internet users' online searches offer great potential for the regional tracking of influenza. However, due to the complex data structure and reduced quality of Internet data at the regional level, few established methods provide satisfactory performance. In this article, we propose a novel method named ARGO2 (2-step Augmented Regression with GOogle data) that efficiently combines publicly available Google search data at different resolutions (national and regional) with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) for accurate, real-time regional tracking of influenza. ARGO2 gives very competitive performance across all US regions compared with available Internet-data-based regional influenza tracking methods, and it has achieved 30% error reduction over the best alternative method that we numerically tested for the period of March 2009 to March 2018. ARGO2 is reliable and robust, with the flexibility to incorporate additional information from other sources and resolutions, making it a powerful tool for regional influenza tracking, and potentially for tracking other social, economic, or public health events at the regional or local level.
Collapse
Affiliation(s)
- Shaoyang Ning
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA
| | - Shihao Yang
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA
| | - S C Kou
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA.
| |
Collapse
|
43
|
Taking connected mobile-health diagnostics of infectious diseases to the field. Nature 2019; 566:467-474. [PMID: 30814711 DOI: 10.1038/s41586-019-0956-2] [Citation(s) in RCA: 192] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 08/08/2018] [Indexed: 11/08/2022]
Abstract
Mobile health, or 'mHealth', is the application of mobile devices, their components and related technologies to healthcare. It is already improving patients' access to treatment and advice. Now, in combination with internet-connected diagnostic devices, it offers novel ways to diagnose, track and control infectious diseases and to improve the efficiency of the health system. Here we examine the promise of these technologies and discuss the challenges in realizing their potential to increase patients' access to testing, aid in their treatment and improve the capability of public health authorities to monitor outbreaks, implement response strategies and assess the impact of interventions across the world.
Collapse
|
44
|
Osthus D, Daughton AR, Priedhorsky R. Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited. PLoS Comput Biol 2019; 15:e1006599. [PMID: 30707689 PMCID: PMC6373968 DOI: 10.1371/journal.pcbi.1006599] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 02/13/2019] [Accepted: 10/30/2018] [Indexed: 11/19/2022] Open
Abstract
The ability to produce timely and accurate flu forecasts in the United States can significantly impact public health. Augmenting forecasts with internet data has shown promise for improving forecast accuracy and timeliness in controlled settings, but results in practice are less convincing, as models augmented with internet data have not consistently outperformed models without internet data. In this paper, we perform a controlled experiment, taking into account data backfill, to improve clarity on the benefits and limitations of augmenting an already good flu forecasting model with internet-based nowcasts. Our results show that a good flu forecasting model can benefit from the augmentation of internet-based nowcasts in practice for all considered public health-relevant forecasting targets. The degree of forecast improvement due to nowcasting, however, is uneven across forecasting targets, with short-term forecasting targets seeing the largest improvements and seasonal targets such as the peak timing and intensity seeing relatively marginal improvements. The uneven forecasting improvements across targets hold even when "perfect" nowcasts are used. These findings suggest that further improvements to flu forecasting, particularly seasonal targets, will need to derive from other, non-nowcasting approaches.
Collapse
Affiliation(s)
- Dave Osthus
- Los Alamos National Laboratory, Los Alamos, New Mexico, USA
| | - Ashlynn R. Daughton
- Los Alamos National Laboratory, Los Alamos, New Mexico, USA
- University of Colorado Boulder, Boulder, Colorado, USA
| | | |
Collapse
|
45
|
Kandula S, Shaman J. Near-term forecasts of influenza-like illness: An evaluation of autoregressive time series approaches. Epidemics 2019; 27:41-51. [PMID: 30792135 DOI: 10.1016/j.epidem.2019.01.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 12/15/2018] [Accepted: 01/16/2019] [Indexed: 10/27/2022] Open
Abstract
Seasonal influenza in the United States is estimated to cause 9-35 million illnesses annually, with resultant economic burden amounting to $47-$150 billion. Reliable real-time forecasts of influenza can help public health agencies better manage these outbreaks. Here, we investigate the feasibility of three autoregressive methods for near-term forecasts: an Autoregressive Integrated Moving Average (ARIMA) model with time-varying order; an ARIMA model fit to seasonally adjusted incidence rates (ARIMA-STL); and a feed-forward autoregressive artificial neural network with a single hidden layer (AR-NN). We generated retrospective forecasts for influenza incidence one to four weeks in the future at US National and 10 regions in the US during 5 influenza seasons. We compared the relative accuracy of the point and probabilistic forecasts of the three models with respect to each other and in relation to two large external validation sets that each comprise at least 20 other models. Both the probabilistic and point forecasts of AR-NN were found to be more accurate than those of the other two models overall. An additional sub-analysis found that the three models benefitted considerably from the use of search trends based 'nowcast' as a proxy for surveillance data, and these three models with use of nowcasts were found to be the highest ranked models in both validation datasets. When the nowcasts were withheld, the three models remained competitive relative to models in the validation sets. The difference in accuracy among the three models, and relative to models of the validation sets, was found to be largely statistically significant. Our results suggest that autoregressive models even when not equipped to capture transmission dynamics can provide reasonably accurate near-term forecasts for influenza. Existing support in open-source libraries make them suitable non-naïve baselines for model comparison studies and for operational forecasts in resource constrained settings where more sophisticated methods may not be feasible.
Collapse
Affiliation(s)
- Sasikiran Kandula
- Department of Environmental Health Sciences, Columbia University, New York, NY, United States.
| | - Jeffrey Shaman
- Department of Environmental Health Sciences, Columbia University, New York, NY, United States
| |
Collapse
|
46
|
Lu FS, Hattab MW, Clemente CL, Biggerstaff M, Santillana M. Improved state-level influenza nowcasting in the United States leveraging Internet-based data and network approaches. Nat Commun 2019; 10:147. [PMID: 30635558 PMCID: PMC6329822 DOI: 10.1038/s41467-018-08082-0] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 12/12/2018] [Indexed: 12/01/2022] Open
Abstract
In the presence of health threats, precision public health approaches aim to provide targeted, timely, and population-specific interventions. Accurate surveillance methodologies that can estimate infectious disease activity ahead of official healthcare-based reports, at relevant spatial resolutions, are important for achieving this goal. Here we introduce a methodological framework which dynamically combines two distinct influenza tracking techniques, using an ensemble machine learning approach, to achieve improved state-level influenza activity estimates in the United States. The two predictive techniques behind the ensemble utilize (1) a self-correcting statistical method combining influenza-related Google search frequencies, information from electronic health records, and historical flu trends within each state, and (2) a network-based approach leveraging spatio-temporal synchronicities observed in historical influenza activity across states. The ensemble considerably outperforms each component method in addition to previously proposed state-specific methods for influenza tracking, with higher correlations and lower prediction errors.
Collapse
Affiliation(s)
- Fred S Lu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA.
| | - Mohammad W Hattab
- Wyss Institute for Biologically Inspired Engineering, Harvard Medical School, Boston, MA, 02115, USA
| | | | - Matthew Biggerstaff
- Influenza Division, National Center for Immunization and Respiratory Disease, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA.
- Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
47
|
Zhang Q, Chai Y, Li X, Young SD, Zhou J. Using internet search data to predict new HIV diagnoses in China: a modelling study. BMJ Open 2018; 8:e018335. [PMID: 30337302 PMCID: PMC6196849 DOI: 10.1136/bmjopen-2017-018335] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Revised: 06/18/2018] [Accepted: 08/20/2018] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVES Internet data are important sources of abundant information regarding HIV epidemics and risk factors. A number of case studies found an association between internet searches and outbreaks of infectious diseases, including HIV. In this research, we examined the feasibility of using search query data to predict the number of new HIV diagnoses in China. DESIGN We identified a set of search queries that are associated with new HIV diagnoses in China. We developed statistical models (negative binomial generalised linear model and its Bayesian variants) to estimate the number of new HIV diagnoses by using data of search queries (Baidu) and official statistics (for the entire country and for Guangdong province) for 7 years (2010 to 2016). RESULTS Search query data were positively associated with the number of new HIV diagnoses in China and in Guangdong province. Experiments demonstrated that incorporating search query data could improve the prediction performance in nowcasting and forecasting tasks. CONCLUSIONS Baidu data can be used to predict the number of new HIV diagnoses in China up to the province level. This study demonstrates the feasibility of using search query data to predict new HIV diagnoses. Results could potentially facilitate timely evidence-based decision making and complement conventional programmes for HIV prevention.
Collapse
Affiliation(s)
- Qingpeng Zhang
- Department of Systems Engineering and Engineering Management, City University of Hong Kong, Kowloon, Hong Kong SAR, China
- City University of Hong Kong Shenzhen Research Institute, Shenzhen, China
| | - Yi Chai
- Department of Systems Engineering and Engineering Management, City University of Hong Kong, Kowloon, Hong Kong SAR, China
- Department of Social Work and Social Administration, The University of Hong Kong, Hong Kong, Hong Kong SAR, China
| | - Xiaoming Li
- Arnold School of Public Health, University of South Carolina, Columbia, South Carolina, USA
| | - Sean D Young
- University of California Institute for Prediction Technology, Department of Family Medicine, University of California Los Angeles, Los Angeles, California, USA
| | - Jiaqi Zhou
- Department of Systems Engineering and Engineering Management, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| |
Collapse
|
48
|
Wagner M, Lampos V, Cox IJ, Pebody R. The added value of online user-generated content in traditional methods for influenza surveillance. Sci Rep 2018; 8:13963. [PMID: 30228285 PMCID: PMC6143510 DOI: 10.1038/s41598-018-32029-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 08/28/2018] [Indexed: 11/09/2022] Open
Abstract
There has been considerable work in evaluating the efficacy of using online data for health surveillance. Often comparisons with baseline data involve various squared error and correlation metrics. While useful, these overlook a variety of other factors important to public health bodies considering the adoption of such methods. In this paper, a proposed surveillance system that incorporates models based on recent research efforts is evaluated in terms of its added value for influenza surveillance at Public Health England. The system comprises of two supervised learning approaches trained on influenza-like illness (ILI) rates provided by the Royal College of General Practitioners (RCGP) and produces ILI estimates using Twitter posts or Google search queries. RCGP ILI rates for different age groups and laboratory confirmed cases by influenza type are used to evaluate the models with a particular focus on predicting the onset, overall intensity, peak activity and duration of the 2015/16 influenza season. We show that the Twitter-based models perform poorly and hypothesise that this is mostly due to the sparsity of the data available and a limited training period. Conversely, the Google-based model provides accurate estimates with timeliness of approximately one week and has the potential to complement current surveillance systems.
Collapse
Affiliation(s)
- Moritz Wagner
- Public Health England, London, UK.
- University College London, London, United Kingdom.
- London School of Hygiene and Tropical Medicine, London, United Kingdom.
| | - Vasileios Lampos
- Department of Computer Science, University College London, London, UK
| | - Ingemar J Cox
- Department of Computer Science, University College London, London, UK
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | | |
Collapse
|
49
|
Morita H, Kramer S, Heaney A, Gil H, Shaman J. Influenza forecast optimization when using different surveillance data types and geographic scale. Influenza Other Respir Viruses 2018; 12:755-764. [PMID: 30028083 PMCID: PMC6185890 DOI: 10.1111/irv.12594] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 07/11/2018] [Indexed: 11/15/2022] Open
Abstract
Background Advance warning of influenza incidence levels from skillful forecasts could help public health officials and healthcare providers implement more timely preparedness and intervention measures to combat outbreaks. Compared to influenza predictions generated at regional and national levels, those generated at finer scales could offer greater value in determining locally appropriate measures; however, to date, the various influenza surveillance data that are collected by state and county departments of health have not been well utilized in influenza prediction. Objectives To assess whether an influenza forecast model system can be optimized to generate accurate forecasts using novel surveillance data streams. Methods Here, we generate retrospective influenza forecasts with a dynamic, compartmental model‐inference system using surveillance data for influenza‐like illness (ILI), laboratory‐confirmed cases, and pneumonia and influenza mortality at state and county levels. We evaluate how specification of 3 system inputs—scaling, observational error variance (OEV), and filter divergence (lambda)—affects forecast accuracy. Results In retrospective forecasts, and across data types, there were no clear optimal combinations for the 3 system inputs; however, scaling was most critical to forecast accuracy, whereas OEV and lambda were not. Conclusions Forecasts using new data streams should be tested to determine an appropriate scaling value using historical data and analyzed for forecast accuracy.
Collapse
Affiliation(s)
- Haruka Morita
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York City, New York
| | - Sarah Kramer
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York City, New York
| | - Alexandra Heaney
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York City, New York
| | - Harold Gil
- Marion County Public Health Department, Indianapolis, Indiana
| | - Jeffrey Shaman
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York City, New York
| |
Collapse
|
50
|
Zhao Y, Xu Q, Chen Y, Tsui KL. Using Baidu index to nowcast hand-foot-mouth disease in China: a meta learning approach. BMC Infect Dis 2018; 18:398. [PMID: 30103690 PMCID: PMC6090735 DOI: 10.1186/s12879-018-3285-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Accepted: 07/31/2018] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Hand, foot, and mouth disease (HFMD) has been recognized as one of the leading infectious diseases among children in China, which causes hundreds of annual deaths since 2008. In China, the reports of monthly HFMD cases usually have a delay of 1-2 months due to the time needed for collecting and processing clinical information. This time lag is far from optimal for policymakers making decisions. To alleviate this information gap, this study uses a meta learning framework and combines publicly Internet-based information (Baidu search queries) for real-time estimation of HFMD cases. METHODS We incorporate Baidu index into modeling to nowcast the monthly HFMD incidences in Guangxi, Zhejiang, Henan provinces and the whole China. We develop a meta learning framework to select appropriate predictive model based on the statistical and time series meta features. Our proposed approach is assessed for the HFMD cases within the time period from July 2015 to June 2016 using multiple evaluation metrics including root mean squared error (RMSE) and correlation coefficient (Corr). RESULTS For the four areas: whole China, Guangxi, Zhejiang, and Henan, our approach is superior to the best competing models, reducing the RMSE by 37, 20, 20, and 30% respectively. Compared with all the alternative predictive methods, our estimates show the strongest correlation with the observations. CONCLUSIONS In this study, the proposed meta learning method significantly improves the HFMD prediction accuracy, demonstrating that: (1) the Internet-based information offers the possibility for effective HFMD nowcasts; (2) the meta learning approach is capable of adapting to a wide variety of data, and enables selecting appropriate method for improving the nowcasting accuracy.
Collapse
Affiliation(s)
- Yang Zhao
- Centre for System Informatics Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region, People's Republic of China.
| | - Qinneng Xu
- Department of Systems Engineering and Engineering Management, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region, People's Republic of China
| | - Yupeng Chen
- Department of Systems Engineering and Engineering Management, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region, People's Republic of China
| | - Kwok Leung Tsui
- Centre for System Informatics Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region, People's Republic of China.,Department of Systems Engineering and Engineering Management, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region, People's Republic of China
| |
Collapse
|