1
|
Ning S, Hussain A, Wang Q. Incorporating connectivity among Internet search data for enhanced influenza-like illness tracking. PLoS One 2024; 19:e0305579. [PMID: 39186560 PMCID: PMC11346739 DOI: 10.1371/journal.pone.0305579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 06/02/2024] [Indexed: 08/28/2024] Open
Abstract
Big data collected from the Internet possess great potential to reveal the ever-changing trends in society. In particular, accurate infectious disease tracking with Internet data has grown in popularity, providing invaluable information for public health decision makers and the general public. However, much of the complex connectivity among the Internet search data is not effectively addressed among existing disease tracking frameworks. To this end, we propose ARGO-C (Augmented Regression with Clustered GOogle data), an integrative, statistically principled approach that incorporates the clustering structure of Internet search data to enhance the accuracy and interpretability of disease tracking. Focusing on multi-resolution %ILI (influenza-like illness) tracking, we demonstrate the improved performance and robustness of ARGO-C over benchmark methods at various geographical resolutions. We also highlight the adaptability of ARGO-C to track various diseases in addition to influenza, and to track other social or economic trends.
Collapse
Affiliation(s)
- Shaoyang Ning
- Department of Mathematics and Statistics, Williams College, Williamstown, MA, United States of America
| | - Ahmed Hussain
- Department of Mathematics and Statistics, Williams College, Williamstown, MA, United States of America
| | - Qing Wang
- Department of Mathematics, Wellesley College, Wellesley, MA, United States of America
| |
Collapse
|
2
|
Yang CX, Baker LM, McLeod-Morin A. Trending ticks: using Google Trends data to understand tickborne disease prevention. Front Public Health 2024; 12:1410713. [PMID: 38939559 PMCID: PMC11208696 DOI: 10.3389/fpubh.2024.1410713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 06/03/2024] [Indexed: 06/29/2024] Open
Abstract
Introduction Ticks and pathogens they carry seriously impact human and animal health, with some diseases like Lyme and Alpha-gal syndrome posing risks. Searching for health information online can change people's health and preventive behaviors, allowing them to face the tick risks. This study aimed to predict the potential risks of tickborne diseases by examining individuals' online search behavior. Methods By scrutinizing the search trends across various geographical areas and timeframes within the United States, we determined outdoor activities associated with potential risks of tick-related diseases. Google Trends was used as the data collection and analysis tool due to its accessibility to big data on people's online searching behaviors. We interact with vast amounts of population search data and provide inferences between population behavior and health-related phenomena. Data were collected in the United States from April 2022 to March 2023, with some terms about outdoor activities and tick risks. Results and Discussion Results highlighted the public's risk susceptibility and severity when participating in activities. Our results found that searches for terms related to tick risk were associated with the five-year average Lyme Disease incidence rates by state, reflecting the predictability of online health searching for tickborne disease risks. Geographically, the results revealed that the states with the highest relative search volumes for tick-related terms were predominantly located in the Eastern region. Periodically, terms can be found to have higher search records during summer. In addition, the results showed that terms related to outdoor activities, such as "corn maze," "hunting," "u-pick," and "park," have moderate associations with tick-related terms. This study provided recommendations for effective communication strategies to encourage the public's adoption of health-promoting behaviors. Displaying warnings in the online search results of individuals who are at high risk for tick exposure or collaborating with outdoor activity locations to disseminate physical preventive messages may help mitigate the risks associated with tickborne diseases.
Collapse
Affiliation(s)
- Cheng-Xian Yang
- Department of Agricultural Education and Communication, University of Florida, Gainesville, FL, United States
| | - Lauri M. Baker
- Department of Agricultural Education and Communication, University of Florida, Gainesville, FL, United States
- UF/IFAS Center for Public Issues Education in Agriculture and Natural Resources, Gainesville, FL, United States
| | - Ashley McLeod-Morin
- UF/IFAS Center for Public Issues Education in Agriculture and Natural Resources, Gainesville, FL, United States
| |
Collapse
|
3
|
López L, Dommar C, San José A, Meyers L, Fox S, Castro L, Rodó X. Changing risk of arboviral emergence in Catalonia due to higher probability of autochthonous outbreaks. Ecol Modell 2023. [DOI: 10.1016/j.ecolmodel.2022.110258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
4
|
Stolerman LM, Clemente L, Poirier C, Parag KV, Majumder A, Masyn S, Resch B, Santillana M. Using digital traces to build prospective and real-time county-level early warning systems to anticipate COVID-19 outbreaks in the United States. SCIENCE ADVANCES 2023; 9:eabq0199. [PMID: 36652520 PMCID: PMC9848273 DOI: 10.1126/sciadv.abq0199] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 12/19/2022] [Indexed: 06/17/2023]
Abstract
Coronavirus disease 2019 (COVID-19) continues to affect the world, and the design of strategies to curb disease outbreaks requires close monitoring of their trajectories. We present machine learning methods that leverage internet-based digital traces to anticipate sharp increases in COVID-19 activity in U.S. counties. In a complementary direction to the efforts led by the Centers for Disease Control and Prevention (CDC), our models are designed to detect the time when an uptrend in COVID-19 activity will occur. Motivated by the need for finer spatial resolution epidemiological insights, we build upon previous efforts conceived at the state level. Our methods-tested in an out-of-sample manner, as events were unfolding, in 97 counties representative of multiple population sizes across the United States-frequently anticipated increases in COVID-19 activity 1 to 6 weeks before local outbreaks, defined when the effective reproduction number Rt becomes larger than 1 for a period of 2 weeks.
Collapse
Affiliation(s)
- Lucas M. Stolerman
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Department of Mathematics, Oklahoma State University, Stillwater, OK, USA
| | - Leonardo Clemente
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA
| | - Canelle Poirier
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Kris V. Parag
- NIHR Health Protection Research Unit, Behavioural Science and Evaluation, University of Bristol, Bristol, UK
| | | | - Serge Masyn
- Global Public Health, Janssen R&D, Beerse, Belgium
| | - Bernd Resch
- Department of Geoinformatics - Z-GIS, University of Salzburg, Salzburg, Austria
- Center for Geographic Analysis, Harvard University, Cambridge, MA, USA
| | - Mauricio Santillana
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA
- Harvard University, T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
5
|
Abdullah NAMH, Dom NC, Salleh SA, Salim H, Precha N. The association between dengue case and climate: A systematic review and meta-analysis. One Health 2022; 15:100452. [PMID: 36561711 PMCID: PMC9767811 DOI: 10.1016/j.onehlt.2022.100452] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/29/2022] [Accepted: 10/30/2022] [Indexed: 11/08/2022] Open
Abstract
Although previous research frequently indicates that climate factors impact dengue transmission, the results are inconsistent. Therefore, this systematic review and meta-analysis highlights and address the complex global health problems towards the human-environment interface and the inter-relationship between these variables. For this purpose, four online electronic databases were searched to conduct a systematic assessment of published studies reporting the association between dengue cases and climate between 2010 and 2022. The meta-analysis was conducted using random effects to assess correlation, publication bias and heterogeneity. The final assessment included eight studies for both systematic review and meta-analysis. A total of four meta-analyses were conducted to evaluate the correlation of dengue cases with climate variables, namely precipitation, temperature, minimum temperature and relative humidity. The highest correlation is observed for precipitation between 83 mm and 15 mm (r = 0.38, 95% CI = 0.31, 0.45), relative humidity between 60.5% and 88.7% (r = 0.30, 95% CI = 0.23, 0.37), minimum temperature between 6.5 °C and 21.4 °C (r = 0.28, 95% CI = 0.05, 0.48) and mean temperature between 21.0 °C and 29.8 °C (r = 0.07, 95% CI = -0.1, 0.24). Thus, the influence of climate variables on the magnitude of dengue cases in terms of their distribution, frequency, and prevailing variables was established and conceptualised. The results of this meta-analysis enable multidisciplinary collaboration to improve dengue surveillance, epidemiology, and prevention programmes.
Collapse
Affiliation(s)
- Nur Athen Mohd Hardy Abdullah
- Faculty of Health Sciences, Universiti Teknologi MARA (UiTM), UITM Cawangan Selangor, 42300 Puncak Alam, Selangor, Malaysia
| | - Nazri Che Dom
- Faculty of Health Sciences, Universiti Teknologi MARA (UiTM), UITM Cawangan Selangor, 42300 Puncak Alam, Selangor, Malaysia
- Integrated Mosquito Research Group (I-MeRGe), Universiti Teknologi MARA (UiTM), UITM Cawangan Selangor, 42300 Puncak Alam, Selangor, Malaysia
- Institute for Biodiversity and Sustainable Development (IBSD), Universiti Teknologi MARA, 40450 Shah Alam, Selangor, Malaysia
- Corresponding author at: Faculty of Health Sciences, Universiti Teknologi MARA, Malaysia.
| | - Siti Aekball Salleh
- Institute for Biodiversity and Sustainable Development (IBSD), Universiti Teknologi MARA, 40450 Shah Alam, Selangor, Malaysia
| | - Hasber Salim
- School of Biological Sciences, Universiti Sains Malaysia, 11800 Penang, Malaysia
| | - Nopadol Precha
- Department of Environmental Health and Technology, School of Public Health, Walailak University, Nakhon Si Thammarat, Thailand
| |
Collapse
|
6
|
Li Z. Forecasting Weekly Dengue Cases by Integrating Google Earth Engine-Based Risk Predictor Generation and Google Colab-Based Deep Learning Modeling in Fortaleza and the Federal District, Brazil. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:13555. [PMID: 36294134 PMCID: PMC9603269 DOI: 10.3390/ijerph192013555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 10/15/2022] [Accepted: 10/18/2022] [Indexed: 06/16/2023]
Abstract
Efficient and accurate dengue risk prediction is an important basis for dengue prevention and control, which faces challenges, such as downloading and processing multi-source data to generate risk predictors and consuming significant time and computational resources to train and validate models locally. In this context, this study proposed a framework for dengue risk prediction by integrating big geospatial data cloud computing based on Google Earth Engine (GEE) platform and artificial intelligence modeling on the Google Colab platform. It enables defining the epidemiological calendar, delineating the predominant area of dengue transmission in cities, generating the data of risk predictors, and defining multi-date ahead prediction scenarios. We implemented the experiments based on weekly dengue cases during 2013-2020 in the Federal District and Fortaleza, Brazil to evaluate the performance of the proposed framework. Four predictors were considered, including total rainfall (Rsum), mean temperature (Tmean), mean relative humidity (RHmean), and mean normalized difference vegetation index (NDVImean). Three models (i.e., random forest (RF), long-short term memory (LSTM), and LSTM with attention mechanism (LSTM-ATT)), and two modeling scenarios (i.e., modeling with or without dengue cases) were set to implement 1- to 4-week ahead predictions. A total of 24 models were built, and the results showed in general that LSTM and LSTM-ATT models outperformed RF models; modeling could benefit from using historical dengue cases as one of the predictors, and it makes the predicted curve fluctuation more stable compared with that only using climate and environmental factors; attention mechanism could further improve the performance of LSTM models. This study provides implications for future dengue risk prediction in terms of the effectiveness of GEE-based big geospatial data processing for risk predictor generation and Google Colab-based risk modeling and presents the benefits of using historical dengue data as one of the input features and the attention mechanism for LSTM modeling.
Collapse
Affiliation(s)
- Zhichao Li
- Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
7
|
COVID-19 forecasts using Internet search information in the United States. Sci Rep 2022; 12:11539. [PMID: 35798774 PMCID: PMC9261899 DOI: 10.1038/s41598-022-15478-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 06/24/2022] [Indexed: 11/26/2022] Open
Abstract
As the COVID-19 ravaging through the globe, accurate forecasts of the disease spread are crucial for situational awareness, resource allocation, and public health decision-making. Alternative to the traditional disease surveillance data collected by the United States (US) Centers for Disease Control and Prevention (CDC), big data from Internet such as online search volumes also contain valuable information for tracking infectious disease dynamics such as influenza epidemic. In this study, we develop a statistical model using Internet search volume of relevant queries to track and predict COVID-19 pandemic in the United States. Inspired by the strong association between COVID-19 death trend and symptom-related search queries such as “loss of taste”, we combine search volume information with COVID-19 time series information for US national level forecasts, while leveraging the cross-state cross-resolution spatial temporal framework, pooling information from search volume and COVID-19 reports across regions for state level predictions. Lastly, we aggregate the state-level frameworks in an ensemble fashion to produce the final state-level 4-week forecasts. Our method outperforms the baseline time-series model, while performing reasonably against other publicly available benchmark models for both national and state level forecast.
Collapse
|
8
|
Wang T, Ma S, Baek S, Yang S. COVID-19 hospitalizations forecasts using internet search data. Sci Rep 2022; 12:9661. [PMID: 35690619 PMCID: PMC9188562 DOI: 10.1038/s41598-022-13162-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 05/20/2022] [Indexed: 11/09/2022] Open
Abstract
As the COVID-19 spread over the globe and new variants of COVID-19 keep occurring, reliable real-time forecasts of COVID-19 hospitalizations are critical for public health decisions on medical resources allocations. This paper aims to forecast future 2 weeks national and state-level COVID-19 new hospital admissions in the United States. Our method is inspired by the strong association between public search behavior and hospitalization admissions and is extended from a previously-proposed influenza tracking model, AutoRegression with GOogle search data (ARGO). Our LASSO-penalized linear regression method efficiently combines Google search information and COVID-19 related time series information with dynamic training and rolling window prediction. Compared to other publicly available models collected from COVID-19 forecast hub, our method achieves substantial error reduction in a retrospective out-of-sample evaluation from Jan 4, 2021, to Dec 27, 2021. Overall, we showed that our method is flexible, self-correcting, robust, accurate, and interpretable, making it a potentially powerful tool to assist healthcare officials and decision making for the current and future infectious disease outbreaks.
Collapse
Affiliation(s)
- Tao Wang
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30309, USA
| | - Simin Ma
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30309, USA
| | - Soobin Baek
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30309, USA
| | - Shihao Yang
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30309, USA.
| |
Collapse
|
9
|
Faster indicators of chikungunya incidence using Google searches. PLoS Negl Trop Dis 2022; 16:e0010441. [PMID: 35679262 PMCID: PMC9182328 DOI: 10.1371/journal.pntd.0010441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 04/21/2022] [Indexed: 11/23/2022] Open
Abstract
Chikungunya, a mosquito-borne disease, is a growing threat in Brazil, where over 640,000 cases have been reported since 2017. However, there are often long delays between diagnoses of chikungunya cases and their entry in the national monitoring system, leaving policymakers without the up-to-date case count statistics they need. In contrast, weekly data on Google searches for chikungunya is available with no delay. Here, we analyse whether Google search data can help improve rapid estimates of chikungunya case counts in Rio de Janeiro, Brazil. We build on a Bayesian approach suitable for data that is subject to long and varied delays, and find that including Google search data reduces both model error and uncertainty. These improvements are largest during epidemics, which are particularly important periods for policymakers. Including Google search data in chikungunya surveillance systems may therefore help policymakers respond to future epidemics more quickly. To respond quickly to disease outbreaks, policymakers need rapid data on the number of new infections. However, for many diseases, such data is very delayed, due to the administrative work required to record each case in a disease surveillance system. This is a problem for data on chikungunya, a mosquito-borne disease which is a growing threat in Brazil. In Rio de Janeiro, delays in chikungunya cases being recorded average four weeks. These delays are sometimes longer and sometimes shorter. In stark contrast to chikungunya data, data on what people are searching for on Google is available almost immediately. People suffering from chikungunya might search on Google for information about the disease. Here, we investigate whether rapidly available Google data can help generate quick estimates of the number of chikungunya cases in Rio de Janeiro in the previous week. Our model uses a Bayesian methodology to help account for the varying delays in the chikungunya data. We show that including Google search data in the model reduces both the error and uncertainty of the chikungunya case count estimates, in particular during epidemics. Our method could be used to help policymakers to respond more quickly to future chikungunya epidemics.
Collapse
|
10
|
Güneri FD, Forestier FBE, Forestier RJ, Karaarslan F, Odabaşi E. YouTube as a source of information for water treatments. INTERNATIONAL JOURNAL OF BIOMETEOROLOGY 2022; 66:781-789. [PMID: 35094110 PMCID: PMC8800846 DOI: 10.1007/s00484-021-02236-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 12/21/2021] [Accepted: 12/25/2021] [Indexed: 05/05/2023]
Abstract
The purpose of the study was to investigate the quality and reliability of YouTube videos as a source of information in water treatments. We searched videos on YouTube ( www.youtube.com ) using the following keywords: "health resort medicine," "spa treatment," "spa therapy," "hydrotherapy," "thermal medicine," "balneology," and "balneotherapy" on June 17th, 2021. The global quality scale (GQS) was used to evaluate the quality of the videos. The assessment of reliability was evaluated using the modified DISCERN tool. Some other video parameters and sources of the videos were also recorded. One hundred twenty-one (121) videos were analyzed. The most common video source was advertisement (46.3%). GQS and modified DISCERN median scores were generally low. They were superior for "hydrotherapy" and "balneotherapy" and were also higher in videos uploaded by health-related persons or organizations (physicians, health-related professionals, and health-related websites). A statistically significant positive correlation was found between investigated parameters (like view ratio, number of likes, video power index, video length) and GQS. Only video length was correlated with modified DISCERN for investigated parameters. The median video power index scores were statistically higher for "spa therapy" and "spa treatment." The YouTube content linked with water treatments has poor quality and reliability most of time. The hydrotherapy and balneotherapy keywords have the best quality and reliability.We think that designers of water treatment videos should involve health professionals more often so that the content of their video will better explain the details of medical conditions or interventions.The scientific experts should ensure a consensus in terminology to straighten the awareness of water treatments for patients and physicians.
Collapse
Affiliation(s)
- Fulya Demircioğlu Güneri
- Department of Medical Ecology and Hydroclimatology, Gülhane Training and Research Hospital, University of Health Sciences, Ankara, Turkey.
| | | | | | - Fatih Karaarslan
- Department of Medical Ecology and Hydroclimatology, Gülhane Training and Research Hospital, University of Health Sciences, Ankara, Turkey
| | - Ersin Odabaşi
- Department of Medical Ecology and Hydroclimatology, Gülhane Training and Research Hospital, University of Health Sciences, Ankara, Turkey
| |
Collapse
|
11
|
Koplewitz G, Lu F, Clemente L, Buckee C, Santillana M. Predicting dengue incidence leveraging internet-based data sources. A case study in 20 cities in Brazil. PLoS Negl Trop Dis 2022; 16:e0010071. [PMID: 35073316 PMCID: PMC8824328 DOI: 10.1371/journal.pntd.0010071] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 02/08/2022] [Accepted: 12/07/2021] [Indexed: 11/25/2022] Open
Abstract
The dengue virus affects millions of people every year worldwide, causing large epidemic outbreaks that disrupt people’s lives and severely strain healthcare systems. In the absence of a reliable vaccine against dengue or an effective treatment to manage the illness in humans, most efforts to combat dengue infections have focused on preventing its vectors, mainly the Aedes aegypti mosquito, from flourishing across the world. These mosquito-control strategies need reliable disease activity surveillance systems to be deployed. Despite significant efforts to estimate dengue incidence using a variety of data sources and methods, little work has been done to understand the relative contribution of the different data sources to improved prediction. Additionally, scholarship on the topic had initially focused on prediction systems at the national- and state-levels, and much remains to be done at the finer spatial resolutions at which health policy interventions often occur. We develop a methodological framework to assess and compare dengue incidence estimates at the city level, and evaluate the performance of a collection of models on 20 different cities in Brazil. The data sources we use towards this end are weekly incidence counts from prior years (seasonal autoregressive terms), weekly-aggregated weather variables, and real-time internet search data. We find that both random forest-based models and LASSO regression-based models effectively leverage these multiple data sources to produce accurate predictions, and that while the performance between them is comparable on average, the former method produces fewer extreme outliers, and can thus be considered more robust. For real-time predictions that assume long delays (6–8 weeks) in the availability of epidemiological data, we find that real-time internet search data are the strongest predictors of dengue incidence, whereas for predictions that assume short delays (1–3 weeks), in which the error rate is halved (as measured by relative RMSE), short-term and seasonal autocorrelation are the dominant predictors. Despite the difficulties inherent to city-level prediction, our framework achieves meaningful and actionable estimates across cities with different demographic, geographic and epidemic characteristics. As the incidence of infectious diseases like dengue continues to increase throughout the world, tracking their spread in real time poses a significant challenge to local and national health authorities. Accurate incidence data are often difficult to obtain as outbreaks emerge and unfold, both due the partial reach of serological surveillance (especially in rural areas), and due to delays in reporting, which result in post-hoc adjustments to what should have been real-time data. Thus, a range of ‘nowcasting’ tools have been developed to estimate disease trends, using different mathematical and statistical methodologies to fill the temporal data gap. Over the past several years, researchers have investigated how to best incorporate internet search data into predictive models, since these can be obtained in real-time. Still, most such models have been regression-based, and have tended to underperform in cases when epidemiological data are only available after long reporting delays. Moreover, in tropical countries, attention has increasingly turned from testing and applying models at the national level to models at higher spatial resolutions, such as states and cities. Here, we develop machine learning models based on both LASSO regression and on random forest ensembles, and proceed to apply and compare them across 20 cities in Brazil. We find that our methodology produces meaningful and actionable disease estimates at the city level with both underlying model classes, and that the two perform comparably across most metrics, although the ensemble method produces fewer outliers. We also compare model performance and the relative contribution of different data sources across diverse geographic, demographic and epidemic conditions.
Collapse
Affiliation(s)
- Gal Koplewitz
- Harvard J. A. Paulson School of Engineering and Applied Sciences, Cambridge, Massachusetts, United States of America
- Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- * E-mail: (GK); (MS)
| | - Fred Lu
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Department of Statistics, Stanford University, California, United States of America
| | - Leonardo Clemente
- Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Caroline Buckee
- Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail: (GK); (MS)
| |
Collapse
|
12
|
Data-driven methods for dengue prediction and surveillance using real-world and Big Data: A systematic review. PLoS Negl Trop Dis 2022; 16:e0010056. [PMID: 34995281 PMCID: PMC8740963 DOI: 10.1371/journal.pntd.0010056] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 12/06/2021] [Indexed: 12/23/2022] Open
Abstract
Background Traditionally, dengue surveillance is based on case reporting to a central health agency. However, the delay between a case and its notification can limit the system responsiveness. Machine learning methods have been developed to reduce the reporting delays and to predict outbreaks, based on non-traditional and non-clinical data sources. The aim of this systematic review was to identify studies that used real-world data, Big Data and/or machine learning methods to monitor and predict dengue-related outcomes. Methodology/Principal findings We performed a search in PubMed, Scopus, Web of Science and grey literature between January 1, 2000 and August 31, 2020. The review (ID: CRD42020172472) focused on data-driven studies. Reviews, randomized control trials and descriptive studies were not included. Among the 119 studies included, 67% were published between 2016 and 2020, and 39% used at least one novel data stream. The aim of the included studies was to predict a dengue-related outcome (55%), assess the validity of data sources for dengue surveillance (23%), or both (22%). Most studies (60%) used a machine learning approach. Studies on dengue prediction compared different prediction models, or identified significant predictors among several covariates in a model. The most significant predictors were rainfall (43%), temperature (41%), and humidity (25%). The two models with the highest performances were Neural Networks and Decision Trees (52%), followed by Support Vector Machine (17%). We cannot rule out a selection bias in our study because of our two main limitations: we did not include preprints and could not obtain the opinion of other international experts. Conclusions/Significance Combining real-world data and Big Data with machine learning methods is a promising approach to improve dengue prediction and monitoring. Future studies should focus on how to better integrate all available data sources and methods to improve the response and dengue management by stakeholders. Dengue is one of the most important arbovirus infections in the world and its public health, societal and economic burden is increasing. Although the majority of dengue cases are asymptomatic or mild, severe disease forms can lead to death. For this reason, early diagnosis and monitoring of dengue are crucial to decrease mortality. However, most endemic regions still rely on traditional monitoring methods, despite the growing availability of novel data sources and data-driven methods based on real-world data, Big Data, and machine learning algorithms. In this systematic review, we identified and analyzed studies that used these novel approaches for dengue monitoring and/or prediction. We found that novel data streams, such as Internet search engines and social media platforms, and machine learning methods can be successfully used to improve dengue management, but are still vastly ignored in real life. These approaches should be combined with traditional methods to help stakeholders better prepare for each outbreak and improve early responsiveness.
Collapse
|
13
|
The Federal Menu Labeling Law and Twitter Discussions about Calories in the United States: An Interrupted Time-Series Analysis. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph182010794. [PMID: 34682538 PMCID: PMC8535269 DOI: 10.3390/ijerph182010794] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 09/21/2021] [Accepted: 09/28/2021] [Indexed: 11/17/2022]
Abstract
Public awareness of calories in food sold in retail establishments is a primary objective of the menu labeling law. This study explores the extent to which we can use social media and internet search queries to understand whether the federal calorie labeling law increased awareness of calories. To evaluate the association of the federal menu labeling law with tweeting about calories we retrieved tweets that contained the term "calorie(s)" from the CompEpi Geo Twitter Database from 1 January through 31 December in 2016 and 2018. Within the same time period, we also retrieved time-series data for search queries related to calories via Google Trends (GT). Interrupted time-series analysis was used to test whether the federal menu labeling law was associated with a change in mentions of "calorie(s)" on Twitter and relative search queries to calories on GT. Before the implementation of the federal calorie labeling law on 7 May 2018, there was a significant decrease in the baseline trend of 4.37 × 10-8 (SE = 1.25 × 10-8, p < 0.001) mean daily ratio of calorie(s) tweets. A significant increase in post-implementation slope of 3.19 × 10-8 (SE = 1.34 × 10-8 , p < 0.018) mean daily ratio of calorie(s) tweets was seen compared to the pre-implementation slope. An interrupted time-series (ITS) analysis showed a small, statistically significant upward trend of 0.0043 (SE = 0.036, p < 0.001) weekly search queries for calories pre-implementation, with no significant level change post-implementation. There was a decrease in trend of 1.22 (SE = 0.27, p < 0.001) in search queries for calories post-implementation. The federal calorie labeling law was associated with a 173% relative increase in the trend of mean daily ratio of tweets and a -28381% relative change in trend for search queries for calories. Twitter results demonstrate an increase in awareness of calories because of the addition of menu labels. Google Trends results imply that fewer people are searching for the calorie content of their meal, which may no longer be needed since calorie information is provided at point of purchase. Given our findings, discussions online about calories may provide a signal of an increased awareness in the implementation of calorie labels.
Collapse
|
14
|
Monnaka VU, Oliveira CACD. Google Trends correlation and sensitivity for outbreaks of dengue and yellow fever in the state of São Paulo. EINSTEIN-SAO PAULO 2021; 19:eAO5969. [PMID: 34346987 PMCID: PMC8302225 DOI: 10.31744/einstein_journal/2021ao5969] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Accepted: 03/04/2021] [Indexed: 02/06/2023] Open
Abstract
Objective To assess Google Trends accuracy for epidemiological surveillance of dengue and yellow fever, and to compare the incidence of these diseases with the popularity of its terms in the state of São Paulo. Methods Retrospective cohort. Google Trends survey results were compared to the actual incidence of diseases, obtained from Centro de Vigilância Epidemiológica “Prof. Alexandre Vranjac”, in São Paulo, Brazil, in periods between 2017 and 2019. The correlation was calculated by Pearson’s coefficient and cross-correlation function. The accuracy was analyzed by sensitivity and specificity values. Results There was a statistically significant correlation between the variables studied for both diseases, Pearson coefficient of 0.91 for dengue and 0.86 for yellow fever. Correlation with up to 4 weeks of anticipation for time series was identified. Sensitivity was 87% and 90%, and specificity 69% and 78% for dengue and yellow fever, respectively. Conclusion The incidence of dengue and yellow fever in the State of São Paulo showed a strong correlation with the popularity of its terms measured by Google Trends in weekly periods. Google Trends tool provided early warning, with high sensitivity, for the detection of outbreaks of these diseases.
Collapse
Affiliation(s)
- Vitor Ulisses Monnaka
- Faculdade Israelita de Ciências da Saúde Albert Einstein, Hospital Israelita Albert Einstein, São Paulo, SP, Brazil
| | | |
Collapse
|
15
|
Hswen Y, Yom-Tov E. Analysis of a Vaping-Associated Lung Injury Outbreak through Participatory Surveillance and Archival Internet Data. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18158203. [PMID: 34360495 PMCID: PMC8346109 DOI: 10.3390/ijerph18158203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 07/28/2021] [Accepted: 07/30/2021] [Indexed: 11/22/2022]
Abstract
The US Centers for Disease Control and Prevention alerted of a suspected outbreak of lung illness associated with using E-cigarette products in September 2019. At the time that the CDC published its alert little was known about the causes of the outbreak or who was at risk for it. Here we provide insights into the outbreak through analysis of passive reporting and participatory surveillance. We collected data about vaping habits and associated adverse reactions from four data sources pertaining to people in the USA: A participatory surveillance platform (YouVape), Reddit, Google Trends, and Bing. Data were analyzed to identify vaping behaviors and reported adverse events. These were correlated among sources and with prior reports. Data was obtained from 720 YouVape users, 4331 Reddit users, and over 1 million Bing users. Large geographic variation was observed across vaping products. Significant correlation was found among the data sources in reported adverse reactions. Models of participatory surveillance data found specific product and adverse reaction associations. Specifically, cannabidiol was found to be associated with fever, while tetrahydrocannabinol was found to be correlated with diarrhea. Our results demonstrate that utilization of different, complementary, online data sources provide a holistic view of vaping associated lung injury while augmenting traditional data sources.
Collapse
Affiliation(s)
- Yulin Hswen
- Department of Epidemiology and Biostatistics, University of California at San Francisco, San Francisco, CA 94158, USA;
- Bakar Computational Health Sciences Institute, University of California at San Francisco, San Francisco, CA 94143, USA
- Innovation Program, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Elad Yom-Tov
- Microsoft Research Israel, 3 Alan Turing Str., Herzeliya 4672415, Israel
- Faculty of Industrial Engineering and Management, Technion, Haifa 3200000, Israel
- Correspondence:
| |
Collapse
|
16
|
Lu FS, Nguyen AT, Link NB, Molina M, Davis JT, Chinazzi M, Xiong X, Vespignani A, Lipsitch M, Santillana M. Estimating the cumulative incidence of COVID-19 in the United States using influenza surveillance, virologic testing, and mortality data: Four complementary approaches. PLoS Comput Biol 2021; 17:e1008994. [PMID: 34138845 PMCID: PMC8241061 DOI: 10.1371/journal.pcbi.1008994] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 06/29/2021] [Accepted: 04/22/2021] [Indexed: 12/20/2022] Open
Abstract
Effectively designing and evaluating public health responses to the ongoing COVID-19 pandemic requires accurate estimation of the prevalence of COVID-19 across the United States (US). Equipment shortages and varying testing capabilities have however hindered the usefulness of the official reported positive COVID-19 case counts. We introduce four complementary approaches to estimate the cumulative incidence of symptomatic COVID-19 in each state in the US as well as Puerto Rico and the District of Columbia, using a combination of excess influenza-like illness reports, COVID-19 test statistics, COVID-19 mortality reports, and a spatially structured epidemic model. Instead of relying on the estimate from a single data source or method that may be biased, we provide multiple estimates, each relying on different assumptions and data sources. Across our four approaches emerges the consistent conclusion that on April 4, 2020, the estimated case count was 5 to 50 times higher than the official positive test counts across the different states. Nationally, our estimates of COVID-19 symptomatic cases as of April 4 have a likely range of 2.3 to 4.8 million, with possibly as many as 7.6 million cases, up to 25 times greater than the cumulative confirmed cases of about 311,000. Extending our methods to May 16, 2020, we estimate that cumulative symptomatic incidence ranges from 4.9 to 10.1 million, as opposed to 1.5 million positive test counts. The proposed combination of approaches may prove useful in assessing the burden of COVID-19 during resurgences in the US and other countries with comparable surveillance systems.
Collapse
Affiliation(s)
- Fred S. Lu
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Andre T. Nguyen
- University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
- Booz Allen Hamilton, Columbia, Maryland, United States of America
| | - Nicholas B. Link
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
| | - Mathieu Molina
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
| | - Jessica T. Davis
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Matteo Chinazzi
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Xinyue Xiong
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Alessandro Vespignani
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Marc Lipsitch
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
17
|
Aiken EL, Nguyen AT, Viboud C, Santillana M. Toward the use of neural networks for influenza prediction at multiple spatial resolutions. SCIENCE ADVANCES 2021; 7:7/25/eabb1237. [PMID: 34134985 PMCID: PMC8208709 DOI: 10.1126/sciadv.abb1237] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 04/29/2021] [Indexed: 05/24/2023]
Abstract
Mitigating the effects of disease outbreaks with timely and effective interventions requires accurate real-time surveillance and forecasting of disease activity, but traditional health care-based surveillance systems are limited by inherent reporting delays. Machine learning methods have the potential to fill this temporal "data gap," but work to date in this area has focused on relatively simple methods and coarse geographic resolutions (state level and above). We evaluate the predictive performance of a gated recurrent unit neural network approach in comparison with baseline machine learning methods for estimating influenza activity in the United States at the state and city levels and experiment with the inclusion of real-time Internet search data. We find that the neural network approach improves upon baseline models for long time horizons of prediction but is not improved by real-time internet search data. We conduct a thorough analysis of feature importances in all considered models for interpretability purposes.
Collapse
Affiliation(s)
- Emily L Aiken
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA.
| | - Andre T Nguyen
- Booz Allen Hamilton, Columbia, MD 21044, USA
- University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | - Cecile Viboud
- Fogarty International Center, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mauricio Santillana
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA.
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02215, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA 02215, USA
| |
Collapse
|
18
|
Kardeş S. Public interest in spa therapy during the COVID-19 pandemic: analysis of Google Trends data among Turkey. INTERNATIONAL JOURNAL OF BIOMETEOROLOGY 2021; 65:945-950. [PMID: 33442780 PMCID: PMC7805426 DOI: 10.1007/s00484-021-02077-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Revised: 12/28/2020] [Accepted: 01/05/2021] [Indexed: 05/03/2023]
Abstract
In Turkey, spas are widely used and preferred by patients who are seeking relief from their disability and pain. The spa therapy program is partly reimbursed by the national health insurance system. The objective of the present study was to leverage Google Trends to elucidate the public interest in spas in Turkey during the COVID-19 pandemic. Google Trends was queried to analyze search trends within Turkey for the Turkish term representing a spa (i.e., kaplıca) from January 01, 2016, to September 30, 2020. The relative search volume of "kaplıca" was statistically significantly decreased in the March 15-May 30, 2020 (- 73.04%; p < 0.001); May 31-July 25, 2020 (- 41.38%; p < 0.001); and July 26-September 19, 2020 (- 29.98%; p < 0.001) periods compared to similar periods of preceding 4 years (2016-2019). After June 1, 2020, the relative search volume was shown to have a moderate recovery, without reaching the level of 2016-2019. Public interest in spas showed an initial sharp decline between mid-March and May, with a moderate increase during the June-August period. This finding might be indicative of public preference in undertaking spa therapy during the COVID-19 period. In Turkey, spas might be used to increase places providing rehabilitation for both non-COVID-19 patients and survivors of COVID-19 with long-term symptoms during the pandemic.
Collapse
Affiliation(s)
- Sinan Kardeş
- Department of Medical Ecology and Hydroclimatology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey.
| |
Collapse
|
19
|
Castro LA, Generous N, Luo W, Pastore y Piontti A, Martinez K, Gomes MFC, Osthus D, Fairchild G, Ziemann A, Vespignani A, Santillana M, Manore CA, Del Valle SY. Using heterogeneous data to identify signatures of dengue outbreaks at fine spatio-temporal scales across Brazil. PLoS Negl Trop Dis 2021; 15:e0009392. [PMID: 34019536 PMCID: PMC8174735 DOI: 10.1371/journal.pntd.0009392] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 06/03/2021] [Accepted: 04/16/2021] [Indexed: 12/18/2022] Open
Abstract
Dengue virus remains a significant public health challenge in Brazil, and seasonal preparation efforts are hindered by variable intra- and interseasonal dynamics. Here, we present a framework for characterizing weekly dengue activity at the Brazilian mesoregion level from 2010-2016 as time series properties that are relevant to forecasting efforts, focusing on outbreak shape, seasonal timing, and pairwise correlations in magnitude and onset. In addition, we use a combination of 18 satellite remote sensing imagery, weather, clinical, mobility, and census data streams and regression methods to identify a parsimonious set of covariates that explain each time series property. The models explained 54% of the variation in outbreak shape, 38% of seasonal onset, 34% of pairwise correlation in outbreak timing, and 11% of pairwise correlation in outbreak magnitude. Regions that have experienced longer periods of drought sensitivity, as captured by the "normalized burn ratio," experienced less intense outbreaks, while regions with regular fluctuations in relative humidity had less regular seasonal outbreaks. Both the pairwise correlations in outbreak timing and outbreak trend between mesoresgions were best predicted by distance. Our analysis also revealed the presence of distinct geographic clusters where dengue properties tend to be spatially correlated. Forecasting models aimed at predicting the dynamics of dengue activity need to identify the most salient variables capable of contributing to accurate predictions. Our findings show that successful models may need to leverage distinct variables in different locations and be catered to a specific task, such as predicting outbreak magnitude or timing characteristics, to be useful. This advocates in favor of "adaptive models" rather than "one-size-fits-all" models. The results of this study can be applied to improving spatial hierarchical or target-focused forecasting models of dengue activity across Brazil.
Collapse
Affiliation(s)
- Lauren A. Castro
- Information Systems and Modeling Group, Analytics, Intelligence and Technology Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Nicholas Generous
- National Security and Defense Program Office, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Wei Luo
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
- Geography Department, National University of Singapore, Singapore, Singapore
| | - Ana Pastore y Piontti
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Kaitlyn Martinez
- Information Systems and Modeling Group, Analytics, Intelligence and Technology Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
- Department of Mathematics & Statistics, Colorado School of Mines, Golden, Colorado, United States of America
| | - Marcelo F. C. Gomes
- Núcleo de Métodos Analíticos em Vigilância Epidemiológica Programa de Computação Científica, Fundação Oswaldo Cruz, Rio de Janeiro, RJ, Brazil
| | - Dave Osthus
- Statistical Sciences Group, Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Geoffrey Fairchild
- Information Systems and Modeling Group, Analytics, Intelligence and Technology Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Amanda Ziemann
- Space Data Science and Systems Group, Intelligence and Space Research Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Alessandro Vespignani
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, Massachusetts, United States of America
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
| | - Carrie A. Manore
- Information Systems and Modeling Group, Analytics, Intelligence and Technology Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Sara Y. Del Valle
- Information Systems and Modeling Group, Analytics, Intelligence and Technology Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| |
Collapse
|
20
|
Kardeş S, Kuzu AS, Pakhchanian H, Raiker R, Karagülle M. Population-level interest in anti-rheumatic drugs in the COVID-19 era: insights from Google Trends. Clin Rheumatol 2021; 40:2047-2055. [PMID: 33130946 PMCID: PMC7603411 DOI: 10.1007/s10067-020-05490-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2020] [Revised: 10/21/2020] [Accepted: 10/28/2020] [Indexed: 12/23/2022]
Abstract
INTRODUCTION/OBJECTIVE The general public may utilize online information through search engines for implications and risks of some anti-rheumatic drugs. These drugs have been used in the management of coronavirus disease 2019 (COVID-19) and associated inflammatory sequelae or cytokine storm of infection. Therefore, the objective of this study was to investigate the population-level interest in anti-rheumatic drugs during the COVID-19 era, by analyzing changes in Google search frequency data. METHOD To obtain the relative search volume (RSV) of anti-rheumatic drugs, we queried Google Trends for 78 search terms representing non-steroidal anti-inflammatory drugs (NSAIDs), glucocorticoids, antigout agents, conventional disease-modifying anti-rheumatic drugs (DMARDs), immunosuppressants, biologics, and Janus kinase (JAK) inhibitors within the USA. Three 8-week periods in 2020 (March 15-May 9), (May 10-July 4), and (July 5-August 29) representing the initial- and short-term periods were compared to overlapping periods of the preceding 3 years (2017-2019). RESULTS We found statistically significant increases in RSV for colchicine, hydroxychloroquine, tocilizumab (and its brand name-Actemra), and anakinra, and statistically significant decreases among brand names of immunosuppressive agents (i.e., mycophenolate mofetil, azathioprine, cyclophosphamide, tacrolimus, cyclosporine) during both the initial- and short-term COVID-19 periods as compared to overlapping periods of the preceding 3 years. CONCLUSION There were significant increases in RSV of colchicine, hydroxychloroquine, tocilizumab, and anakinra during both initial- and short-term COVID-19 periods when compared to overlapping periods of the preceding 3 years reflecting a heightened level of information-seeking on these drugs during the pandemic. Rheumatologists should address this increase in informational demand. Further research assessing medium- and long-term interest in anti-rheumatic drugs is required to increase our knowledge on this new pandemic. Key Points •This study was aimed to investigate the population-level interest in anti-rheumatic drugs in the COVID-19 era, by analyzing changes in Google search frequency data. •Significant increases were seen in relative searches for colchicine, hydroxychloroquine, tocilizumab, and anakinra during both initial and short-term COVID-19 periods when compared to similar periods of 2017-2019 reflecting a heightened level of information-seeking on these drugs during the pandemic. •Rheumatologists should address this increase in informational demand for colchicine, hydroxychloroquine, tocilizumab, and anakinra.
Collapse
Affiliation(s)
- Sinan Kardeş
- Department of Medical Ecology and Hydroclimatology, Istanbul Faculty of Medicine, Istanbul University, Capa-Fatih, 34093 Istanbul, Turkey
| | - Ali Suat Kuzu
- Department of Medical Ecology and Hydroclimatology, Istanbul Faculty of Medicine, Istanbul University, Capa-Fatih, 34093 Istanbul, Turkey
| | - Haig Pakhchanian
- George Washington University School of Medicine & Health Science, Washington, DC USA
| | - Rahul Raiker
- West Virginia University School of Medicine, Morgantown, WV USA
| | - Mine Karagülle
- Department of Medical Ecology and Hydroclimatology, Istanbul Faculty of Medicine, Istanbul University, Capa-Fatih, 34093 Istanbul, Turkey
| |
Collapse
|
21
|
Yi D, Ning S, Chang CJ, Kou SC. Forecasting Unemployment Using Internet Search Data via PRISM. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1883436] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
| | - Shaoyang Ning
- Department of Mathematics and Statistics, Williams College, Williamstown, MA
| | - Chia-Jung Chang
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| | - S. C. Kou
- Department of Statistics, Harvard University, Cambridge, MA
| |
Collapse
|
22
|
|
23
|
Use Internet search data to accurately track state level influenza epidemics. Sci Rep 2021; 11:4023. [PMID: 33597556 PMCID: PMC7889878 DOI: 10.1038/s41598-021-83084-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 01/28/2021] [Indexed: 11/22/2022] Open
Abstract
For epidemics control and prevention, timely insights of potential hot spots are invaluable. Alternative to traditional epidemic surveillance, which often lags behind real time by weeks, big data from the Internet provide important information of the current epidemic trends. Here we present a methodology, ARGOX (Augmented Regression with GOogle data CROSS space), for accurate real-time tracking of state-level influenza epidemics in the United States. ARGOX combines Internet search data at the national, regional and state levels with traditional influenza surveillance data from the Centers for Disease Control and Prevention, and accounts for both the spatial correlation structure of state-level influenza activities and the evolution of people’s Internet search pattern. ARGOX achieves on average 28% error reduction over the best alternative for real-time state-level influenza estimation for 2014 to 2020. ARGOX is robust and reliable and can be potentially applied to track county- and city-level influenza activity and other infectious diseases.
Collapse
|
24
|
Kiang MV, Santillana M, Chen JT, Onnela JP, Krieger N, Engø-Monsen K, Ekapirat N, Areechokchai D, Prempree P, Maude RJ, Buckee CO. Incorporating human mobility data improves forecasts of Dengue fever in Thailand. Sci Rep 2021; 11:923. [PMID: 33441598 PMCID: PMC7806770 DOI: 10.1038/s41598-020-79438-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 11/19/2020] [Indexed: 01/08/2023] Open
Abstract
Over 390 million people worldwide are infected with dengue fever each year. In the absence of an effective vaccine for general use, national control programs must rely on hospital readiness and targeted vector control to prepare for epidemics, so accurate forecasting remains an important goal. Many dengue forecasting approaches have used environmental data linked to mosquito ecology to predict when epidemics will occur, but these have had mixed results. Conversely, human mobility, an important driver in the spatial spread of infection, is often ignored. Here we compare time-series forecasts of dengue fever in Thailand, integrating epidemiological data with mobility models generated from mobile phone data. We show that geographically-distant provinces strongly connected by human travel have more highly correlated dengue incidence than weakly connected provinces of the same distance, and that incorporating mobility data improves traditional time-series forecasting approaches. Notably, no single model or class of model always outperformed others. We propose an adaptive, mosaic forecasting approach for early warning systems.
Collapse
Affiliation(s)
- Mathew V Kiang
- Department of Epidemiology and Population Health, Stanford University, Stanford, CA, USA
| | - Mauricio Santillana
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
| | - Jarvis T Chen
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jukka-Pekka Onnela
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Nancy Krieger
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Nattwut Ekapirat
- Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Darin Areechokchai
- Bureau of Vector Borne Disease, Ministry of Public Health, Nonthaburi, Thailand
| | - Preecha Prempree
- Bureau of Vector Borne Disease, Ministry of Public Health, Nonthaburi, Thailand
| | - Richard J Maude
- Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
- Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, 5th Floor, Boston, MA, 02115, USA
| | - Caroline O Buckee
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, 5th Floor, Boston, MA, 02115, USA.
| |
Collapse
|
25
|
Guo Y, Feng Y, Qu F, Zhang L, Yan B, Lv J. Prediction of hepatitis E using machine learning models. PLoS One 2020; 15:e0237750. [PMID: 32941452 PMCID: PMC7497991 DOI: 10.1371/journal.pone.0237750] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 08/01/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Accurate and reliable predictions of infectious disease can be valuable to public health organizations that plan interventions to decrease or prevent disease transmission. A great variety of models have been developed for this task. However, for different data series, the performance of these models varies. Hepatitis E, as an acute liver disease, has been a major public health problem. Which model is more appropriate for predicting the incidence of hepatitis E? In this paper, three different methods are used and the performance of the three methods is compared. METHODS Autoregressive integrated moving average(ARIMA), support vector machine(SVM) and long short-term memory(LSTM) recurrent neural network were adopted and compared. ARIMA was implemented by python with the help of statsmodels. SVM was accomplished by matlab with libSVM library. LSTM was designed by ourselves with Keras, a deep learning library. To tackle the problem of overfitting caused by limited training samples, we adopted dropout and regularization strategies in our LSTM model. Experimental data were obtained from the monthly incidence and cases number of hepatitis E from January 2005 to December 2017 in Shandong province, China. We selected data from July 2015 to December 2017 to validate the models, and the rest was taken as training set. Three metrics were applied to compare the performance of models, including root mean square error(RMSE), mean absolute percentage error(MAPE) and mean absolute error(MAE). RESULTS By analyzing data, we took ARIMA(1, 1, 1), ARIMA(3, 1, 2) as monthly incidence prediction model and cases number prediction model, respectively. Cross-validation and grid search were used to optimize parameters of SVM. Penalty coefficient C and kernel function parameter g were set 8, 0.125 for incidence prediction, and 22, 0.01 for cases number prediction. LSTM has 4 nodes. Dropout and L2 regularization parameters were set 0.15, 0.001, respectively. By the metrics of RMSE, we obtained 0.022, 0.0204, 0.01 for incidence prediction, using ARIMA, SVM and LSTM. And we obtained 22.25, 20.0368, 11.75 for cases number prediction, using three models. For MAPE metrics, the results were 23.5%, 21.7%, 15.08%, and 23.6%, 21.44%, 13.6%, for incidence prediction and cases number prediction, respectively. For MAE metrics, the results were 0.018, 0.0167, 0.011 and 18.003, 16.5815, 9.984, for incidence prediction and cases number prediction, respectively. CONCLUSIONS Comparing ARIMA, SVM and LSTM, we found that nonlinear models(SVM, LSTM) outperform linear models(ARIMA). LSTM obtained the best performance in all three metrics of RSME, MAPE, MAE. Hence, LSTM is the most suitable for predicting hepatitis E monthly incidence and cases number.
Collapse
Affiliation(s)
- Yanhui Guo
- School of Data and Computer Science, Shandong Women’s Unversity, Jinan, Shandong, China
| | - Yi Feng
- Shandong Provincial Key Laboratory of Infectious Disease Control and Prevention, Shandong Center for Disease Control and Prevention, Jinan, Shandong, China
- Academy of Preventive Medicine, Shandong University, Jinan, Shandong, China
| | - Fuli Qu
- School of Data and Computer Science, Shandong Women’s Unversity, Jinan, Shandong, China
| | - Li Zhang
- Shandong Provincial Key Laboratory of Infectious Disease Control and Prevention, Shandong Center for Disease Control and Prevention, Jinan, Shandong, China
- Academy of Preventive Medicine, Shandong University, Jinan, Shandong, China
| | - Bingyu Yan
- Shandong Provincial Key Laboratory of Infectious Disease Control and Prevention, Shandong Center for Disease Control and Prevention, Jinan, Shandong, China
- Academy of Preventive Medicine, Shandong University, Jinan, Shandong, China
| | - Jingjing Lv
- Shandong Provincial Key Laboratory of Infectious Disease Control and Prevention, Shandong Center for Disease Control and Prevention, Jinan, Shandong, China
- Academy of Preventive Medicine, Shandong University, Jinan, Shandong, China
| |
Collapse
|
26
|
Chevalier-Cottin EP, Ashbaugh H, Brooke N, Gavazzi G, Santillana M, Burlet N, Tin Tin Htar M. Communicating Benefits from Vaccines Beyond Preventing Infectious Diseases. Infect Dis Ther 2020; 9:467-480. [PMID: 32583334 PMCID: PMC7452969 DOI: 10.1007/s40121-020-00312-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Indexed: 02/06/2023] Open
Abstract
Despite immunisation being one of the greatest medical success stories of the twentieth century, there is a growing lack of confidence in some vaccines. Improving communication about the direct benefits of vaccination as well as its benefits beyond preventing infectious diseases may help regain this lost confidence. A conference was organised at the Fondation Merieux in France to discuss what benefits could be communicated and how innovative digital initiatives can used for communication. During this meeting, a wide range of indirect benefits of vaccination were discussed. For example, influenza vaccination can reduce hospitalisations and deaths in older persons with diabetes by 45% and 38%, respectively, but the link between influenza and complications from underlying chronic non-communicable diseases such as diabetes is frequently underestimated. Vaccination can reduce antimicrobial resistance (AMR), which is growing, by reducing the incidence of infectious disease (though direct and indirect or herd protection), by reducing the number of circulating AMR strains, and by reducing the need for antimicrobial use. Disease morbidity and treatment costs in the elderly population are likely to rise substantially, with the ageing global population. Healthy ageing and life-course vaccination approaches can reduce the burden of vaccine-preventable diseases, such as seasonal influenza and pneumococcal diseases, which place a significant burden on individuals and society, while improving quality of life. Novel disease surveillance systems based on information from Internet search engines, mobile phone apps, social media, cloud-based electronic health records, and crowd-sourced systems, contribute to improved awareness of disease burden. Examples of the role of new techniques and tools to process data generated by multiple sources, such as artificial intelligence, to support vaccination programmes, such as influenza and dengue, were discussed. The conference participants agreed that continual efforts are needed from all stakeholders to ensure effective, transparent communication of the full benefits and risks of vaccination.
Collapse
Affiliation(s)
| | - Hayley Ashbaugh
- Department of Epidemiology, UCLA Fielding School of Public Health, UCLA Fielding School of Public Health, South, Los Angeles, CA, USA
| | | | - Gaetan Gavazzi
- Geriatric Clinic, Grenoble-Alpes University Hospital, GREPI EA, Grenoble-Alpes University, 7408, Grenoble, France
| | | | - Nansa Burlet
- Global head Patient Insights Innovation, Sanofi Pasteur, Lyon, France
| | - Myint Tin Tin Htar
- Medical Development and Scientific/Clinical Affairs, Pfizer Inc., Paris, France
| |
Collapse
|
27
|
Lu FS, Nguyen AT, Link NB, Davis JT, Chinazzi M, Xiong X, Vespignani A, Lipsitch M, Santillana M. Estimating the Cumulative Incidence of COVID-19 in the United States Using Four Complementary Approaches. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2020:2020.04.18.20070821. [PMID: 32587997 PMCID: PMC7310656 DOI: 10.1101/2020.04.18.20070821] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Effectively designing and evaluating public health responses to the ongoing COVID-19 pandemic requires accurate estimation of the prevalence of COVID-19 across the United States (US). Equipment shortages and varying testing capabilities have however hindered the useful-ness of the official reported positive COVID-19 case counts. We introduce four complementary approaches to estimate the cumulative incidence of symptomatic COVID-19 in each state in the US as well as Puerto Rico and the District of Columbia, using a combination of excess influenza-like illness reports, COVID-19 test statistics, COVID-19 mortality reports, and a spatially structured epidemic model. Instead of relying on the estimate from a single data source or method that may be biased, we provide multiple estimates, each relying on different assumptions and data sources. Across our four approaches emerges the consistent conclusion that on April 4, 2020, the estimated case count was 5 to 50 times higher than the official positive test counts across the different states. Nationally, our estimates of COVID-19 symptomatic cases as of April 4 have a likely range of 2.2 to 4.9 million, with possibly as many as 8.1 million cases, up to 26 times greater than the cumulative confirmed cases of about 311,000. Extending our method to May 16, 2020, we estimate that cumulative symptomatic incidence ranges from 6.0 to 10.3 million, as opposed to 1.5 million positive test counts. The proposed combination of approaches may prove useful in assessing the burden of COVID-19 during resurgences in the US and other countries with comparable surveillance systems.
Collapse
Affiliation(s)
- Fred S. Lu
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
- Department of Statistics, Stanford University, Stanford, CA
| | - Andre T. Nguyen
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
- University of Maryland, Baltimore County, Baltimore, MD
- Booz Allen Hamilton, Columbia, MD
| | - Nicholas B. Link
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
| | - Jessica T. Davis
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA USA
| | - Matteo Chinazzi
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA USA
| | - Xinyue Xiong
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA USA
| | - Alessandro Vespignani
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA USA
| | - Marc Lipsitch
- Department of Epidemiology, Harvard T.H. Chan School of Public Health
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health
- Department of Pediatrics, Harvard Medical School, Boston, MA
| |
Collapse
|
28
|
Aiken EL, McGough SF, Majumder MS, Wachtel G, Nguyen AT, Viboud C, Santillana M. Real-time estimation of disease activity in emerging outbreaks using internet search information. PLoS Comput Biol 2020; 16:e1008117. [PMID: 32804932 PMCID: PMC7451983 DOI: 10.1371/journal.pcbi.1008117] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 08/27/2020] [Accepted: 07/01/2020] [Indexed: 11/18/2022] Open
Abstract
Understanding the behavior of emerging disease outbreaks in, or ahead of, real-time could help healthcare officials better design interventions to mitigate impacts on affected populations. Most healthcare-based disease surveillance systems, however, have significant inherent reporting delays due to data collection, aggregation, and distribution processes. Recent work has shown that machine learning methods leveraging a combination of traditionally collected epidemiological information and novel Internet-based data sources, such as disease-related Internet search activity, can produce meaningful "nowcasts" of disease incidence ahead of healthcare-based estimates, with most successful case studies focusing on endemic and seasonal diseases such as influenza and dengue. Here, we apply similar computational methods to emerging outbreaks in geographic regions where no historical presence of the disease of interest has been observed. By combining limited available historical epidemiological data available with disease-related Internet search activity, we retrospectively estimate disease activity in five recent outbreaks weeks ahead of traditional surveillance methods. We find that the proposed computational methods frequently provide useful real-time incidence estimates that can help fill temporal data gaps resulting from surveillance reporting delays. However, the proposed methods are limited by issues of sample bias and skew in search query volumes, perhaps as a result of media coverage.
Collapse
Affiliation(s)
- Emily L. Aiken
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sarah F. McGough
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Maimuna S. Majumder
- Department of Healthcare Policy, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Gal Wachtel
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
| | - Andre T. Nguyen
- Booz Allen Hamilton, Columbia, Maryland, United States of America
- University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
| | - Cecile Viboud
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Mauricio Santillana
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
29
|
Cousins HC, Cousins CC, Harris A, Pasquale LR. Regional Infoveillance of COVID-19 Case Rates: Analysis of Search-Engine Query Patterns. J Med Internet Res 2020; 22:e19483. [PMID: 32692691 PMCID: PMC7394521 DOI: 10.2196/19483] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Revised: 07/13/2020] [Accepted: 07/19/2020] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Timely allocation of medical resources for coronavirus disease (COVID-19) requires early detection of regional outbreaks. Internet browsing data may predict case outbreaks in local populations that are yet to be confirmed. OBJECTIVE We investigated whether search-engine query patterns can help to predict COVID-19 case rates at the state and metropolitan area levels in the United States. METHODS We used regional confirmed case data from the New York Times and Google Trends results from 50 states and 166 county-based designated market areas (DMA). We identified search terms whose activity precedes and correlates with confirmed case rates at the national level. We used univariate regression to construct a composite explanatory variable based on best-fitting search queries offset by temporal lags. We measured the raw and z-transformed Pearson correlation and root-mean-square error (RMSE) of the explanatory variable with out-of-sample case rate data at the state and DMA levels. RESULTS Predictions were highly correlated with confirmed case rates at the state (mean r=0.69, 95% CI 0.51-0.81; median RMSE 1.27, IQR 1.48) and DMA levels (mean r=0.51, 95% CI 0.39-0.61; median RMSE 4.38, IQR 1.80), using search data available up to 10 days prior to confirmed case rates. They fit case-rate activity in 49 of 50 states and in 103 of 166 DMA at a significance level of .05. CONCLUSIONS Identifiable patterns in search query activity may help to predict emerging regional outbreaks of COVID-19, although they remain vulnerable to stochastic changes in search intensity.
Collapse
Affiliation(s)
- Henry C Cousins
- Department of Genetics, Stanford School of Medicine, Stanford, CA, United States
| | - Clara C Cousins
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States.,Department of Data Sciences, Dana-Farber Cancer Institute, Harvard TH Chan School of Public Health, Boston, MA, United States.,Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, United States
| | - Alon Harris
- Department of Ophthalmology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Louis R Pasquale
- Department of Ophthalmology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
30
|
Gónzalez-Bandala DA, Cuevas-Tello JC, Noyola DE, Comas-García A, García-Sepúlveda CA. Computational Forecasting Methodology for Acute Respiratory Infectious Disease Dynamics. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17124540. [PMID: 32599746 PMCID: PMC7344846 DOI: 10.3390/ijerph17124540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 06/06/2020] [Accepted: 06/07/2020] [Indexed: 11/16/2022]
Abstract
The study of infectious disease behavior has been a scientific concern for many years as early identification of outbreaks provides great advantages including timely implementation of public health measures to limit the spread of an epidemic. We propose a methodology that merges the predictions of (i) a computational model with machine learning, (ii) a projection model, and (iii) a proposed smoothed endemic channel calculation. The predictions are made on weekly acute respiratory infection (ARI) data obtained from epidemiological reports in Mexico, along with the usage of key terms in the Google search engine. The results obtained with this methodology were compared with state-of-the-art techniques resulting in reduced root mean squared percentage error (RMPSE) and maximum absolute percent error (MAPE) metrics, achieving a MAPE of 21.7%. This methodology could be extended to detect and raise alerts on possible outbreaks on ARI as well as for other seasonal infectious diseases.
Collapse
Affiliation(s)
| | | | - Daniel E. Noyola
- Microbiology Department, Medicine Faculty, UASLP, San Luis Potosí 78290, Mexico; (D.E.N.); (A.C.-G.)
| | - Andreu Comas-García
- Microbiology Department, Medicine Faculty, UASLP, San Luis Potosí 78290, Mexico; (D.E.N.); (A.C.-G.)
| | | |
Collapse
|
31
|
Strauss R, Lorenz E, Kristensen K, Eibach D, Torres J, May J, Castro J. Investigating the utility of Google trends for Zika and Chikungunya surveillance in Venezuela. BMC Public Health 2020; 20:947. [PMID: 32546159 PMCID: PMC7298838 DOI: 10.1186/s12889-020-09059-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 06/04/2020] [Indexed: 11/10/2022] Open
Abstract
INTRODUCTION Chikungunya and Zika Virus are vector-borne diseases responsible for a substantial disease burden in the Americas. Between 2013 and 2016, no cases of Chikungunya or Zika Virus were reported by the Venezuelan Ministry of Health. However, peaks of undiagnosed fever cases have been observed during the same period. In the context of scarce data, alternative surveillance methods are needed. Assuming that unusual peaks of acute fever cases correspond to the incidences of both diseases, this study aims to evaluate the use of Google Trends as an indicator of the epidemic behavior of Chikungunya and Zika. METHODS Time-series cross-correlations of acute fever cases reported by the Venezuelan Ministry of Health and data on Google search queries related to Chikungunya and Zika were calculated. RESULTS A temporal distinction has been made so that acute febrile cases occurring between 25th of June 2014 and 23rd of April 2015 were attributed to the Chikungunya virus, while cases occurring between 30th of April 2015 and 29th of April 2016 were ascribed to the Zika virus. The highest cross-correlations for each disease were shown at a lag of 0 (r = 0.784) for Chikungunya and at + 1 (r = 0.754) for Zika. CONCLUSION The strong positive correlation between Google search queries and official data on acute febrile cases suggests that this resource can be used as an indicator of endemic urban arboviruses activity. In the Venezuelan context, Internet search queries might help to overcome some of the gaps that exist in the national surveillance system.
Collapse
Affiliation(s)
- Ricardo Strauss
- Bernhard Nocht Institute for Tropical Medicine, Research Group Infectious Disease Epidemiology, Hamburg, Germany.
| | - Eva Lorenz
- Bernhard Nocht Institute for Tropical Medicine, Research Group Infectious Disease Epidemiology, Hamburg, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics, University Medical Center, Mainz, Germany
| | - Kaja Kristensen
- Bernhard Nocht Institute for Tropical Medicine, Research Group Infectious Disease Epidemiology, Hamburg, Germany.,Faculty of Life Sciences, Hamburg University of Applied Sciences, Ulmenliet 20, 21033, Hamburg, Germany
| | - Daniel Eibach
- Bernhard Nocht Institute for Tropical Medicine, Research Group Infectious Disease Epidemiology, Hamburg, Germany
| | - Jaime Torres
- Instituto de Medicina Tropical, Universidad Central de Venezuela, Caracas, Venezuela
| | - Jürgen May
- Bernhard Nocht Institute for Tropical Medicine, Research Group Infectious Disease Epidemiology, Hamburg, Germany
| | - Julio Castro
- Instituto de Medicina Tropical, Universidad Central de Venezuela, Caracas, Venezuela
| |
Collapse
|
32
|
Romero-Alvarez D, Parikh N, Osthus D, Martinez K, Generous N, Del Valle S, Manore CA. Google Health Trends performance reflecting dengue incidence for the Brazilian states. BMC Infect Dis 2020; 20:252. [PMID: 32228508 PMCID: PMC7104526 DOI: 10.1186/s12879-020-04957-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 03/10/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Dengue fever is a mosquito-borne infection transmitted by Aedes aegypti and mainly found in tropical and subtropical regions worldwide. Since its re-introduction in 1986, Brazil has become a hotspot for dengue and has experienced yearly epidemics. As a notifiable infectious disease, Brazil uses a passive epidemiological surveillance system to collect and report cases; however, dengue burden is underestimated. Thus, Internet data streams may complement surveillance activities by providing real-time information in the face of reporting lags. METHODS We analyzed 19 terms related to dengue using Google Health Trends (GHT), a free-Internet data-source, and compared it with weekly dengue incidence between 2011 to 2016. We correlated GHT data with dengue incidence at the national and state-level for Brazil while using the adjusted R squared statistic as primary outcome measure (0/1). We used survey data on Internet access and variables from the official census of 2010 to identify where GHT could be useful in tracking dengue dynamics. Finally, we used a standardized volatility index on dengue incidence and developed models with different variables with the same objective. RESULTS From the 19 terms explored with GHT, only seven were able to consistently track dengue. From the 27 states, only 12 reported an adjusted R squared higher than 0.8; these states were distributed mainly in the Northeast, Southeast, and South of Brazil. The usefulness of GHT was explained by the logarithm of the number of Internet users in the last 3 months, the total population per state, and the standardized volatility index. CONCLUSIONS The potential contribution of GHT in complementing traditional established surveillance strategies should be analyzed in the context of geographical resolutions smaller than countries. For Brazil, GHT implementation should be analyzed in a case-by-case basis. State variables including total population, Internet usage in the last 3 months, and the standardized volatility index could serve as indicators determining when GHT could complement dengue state level surveillance in other countries.
Collapse
Affiliation(s)
- Daniel Romero-Alvarez
- Department of Ecology & Evolutionary Biology and Biodiversity Institute, University of Kansas, Lawrence, Kansas, USA.
- Information Systems and Modeling (A-1), Los Alamos National Laboratory, Los Alamos, NM, USA.
| | - Nidhi Parikh
- Information Systems and Modeling (A-1), Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Dave Osthus
- Statistical Sciences (CCS-6), Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Kaitlyn Martinez
- Information Systems and Modeling (A-1), Los Alamos National Laboratory, Los Alamos, NM, USA
- Applied Math and Statistics, Colorado School of Mines, Golden, CO, USA
| | - Nicholas Generous
- National Security & Defense Program Office (GS-NSD), Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Sara Del Valle
- Information Systems and Modeling (A-1), Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Carrie A Manore
- Information Systems and Modeling (A-1), Los Alamos National Laboratory, Los Alamos, NM, USA
| |
Collapse
|
33
|
Slums, Space, and State of Health-A Link between Settlement Morphology and Health Data. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17062022. [PMID: 32204347 PMCID: PMC7143924 DOI: 10.3390/ijerph17062022] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 03/06/2020] [Accepted: 03/13/2020] [Indexed: 12/31/2022]
Abstract
Approximately 1 billion slum dwellers worldwide are exposed to increased health risks due to their spatial environment. Recent studies have therefore called for the spatial environment to be introduced as a separate dimension in medical studies. Hence, this study investigates how and on which spatial scale relationships between the settlement morphology and the health status of the inhabitants can be identified. To this end, we summarize the current literature on the identification of slums from a geographical perspective and review the current literature on slums and health of the last five years (376 studies) focusing on the considered scales in the studies. We show that the majority of medical studies are restricted to certain geographical regions. It is desirable that the number of studies be adapted to the number of the respective population. On the basis of these studies, we develop a framework to investigate the relationship between space and health. Finally, we apply our methodology to investigate the relationship between the prevalence of slums and different health metrics using data of the global burden of diseases for different prefectures in Brazil on a subnational level.
Collapse
|
34
|
Samaras L, García-Barriocanal E, Sicilia MA. Syndromic surveillance using web data: a systematic review. INNOVATION IN HEALTH INFORMATICS 2020. [PMCID: PMC7153324 DOI: 10.1016/b978-0-12-819043-2.00002-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
During the recent years, a lot of debate is taken place about the evolution of Smart Healthcare systems. Particularly, how these systems can help people improve human conditions of health, by taking advantages of the new Information and Communication Technologies (ICT), regarding early prediction and efficient treatment. The purpose of this study is to provide a systematic review of the current literature available that focuses on information systems on syndromic surveillance using web data. All published items concern articles, books, reviews, reports, conference announcements, and dissertations. We used a variation of PRISMA Statements methodology to conduct a systematic review. The review identifies the relevant published papers from the year 2004 to 2018, systematically includes and explores them to extract similarities, gaps, and conclusions on the research that has been done so far. The results presented concern the year, the examined disease, the web data source, the geographic location/country, and the data analysis method used. The results show that influenza is the most examined infectious disease. The internet tools most used are Twitter and Google. Regarding the geographical areas explored in the published papers, the most examined country is the United States, since many scientists come from this country. There is a significant growth of articles since 2009. There are also various statistical methods used to correlate the data retrieved from the internet to the data from national authorities. The conclusion of all researches is that the Web can be a useful tool for the detection of serious epidemics and for a creation of a syndromic surveillance system using the Web, since we can predict epidemics from web data before they are officially detected in population. With the advance of ICT, Smart Healthcare can benefit from the monitoring of epidemics and the early prediction of such a system, improving national or international health strategies and policy decision. This can be achieved through the provision of new technology tools to enhance health monitoring systems toward the new innovations of Smart Health or eHealth, even with the emerging technologies of Internet of Things. The challenges and impacts of an electronic system based on internet data include the social, medical, and technological disciplines. These can be further extended to Smart Healthcare, as the data streaming can provide with real-time information, awareness on epidemics and alerts for both patients or medical scientists. Finally, these new systems can help improve the standards of human life.
Collapse
|
35
|
Tideman S, Santillana M, Bickel J, Reis B. Internet search query data improve forecasts of daily emergency department volume. J Am Med Inform Assoc 2019; 26:1574-1583. [PMID: 31730701 PMCID: PMC7647136 DOI: 10.1093/jamia/ocz154] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 07/25/2019] [Accepted: 08/06/2019] [Indexed: 11/15/2022] Open
Abstract
OBJECTIVE Emergency departments (EDs) are increasingly overcrowded. Forecasting patient visit volume is challenging. Reliable and accurate forecasting strategies may help improve resource allocation and mitigate the effects of overcrowding. Patterns related to weather, day of the week, season, and holidays have been previously used to forecast ED visits. Internet search activity has proven useful for predicting disease trends and offers a new opportunity to improve ED visit forecasting. This study tests whether Google search data and relevant statistical methods can improve the accuracy of ED volume forecasting compared with traditional data sources. MATERIALS AND METHODS Seven years of historical daily ED arrivals were collected from Boston Children's Hospital. We used data from the public school calendar, National Oceanic and Atmospheric Administration, and Google Trends. Multiple linear models using LASSO (least absolute shrinkage and selection operator) for variable selection were created. The models were trained on 5 years of data and out-of-sample accuracy was judged using multiple error metrics on the final 2 years. RESULTS All data sources added complementary predictive power. Our baseline day-of-the-week model recorded average percent errors of 10.99%. Autoregressive terms, calendar and weather data reduced errors to 7.71%. Search volume data reduced errors to 7.58% theoretically preventing 4 improperly staffed days. DISCUSSION The predictive power provided by the search volume data may stem from the ability to capture population-level interaction with events, such as winter storms and infectious diseases, that traditional data sources alone miss. CONCLUSIONS This study demonstrates that search volume data can meaningfully improve forecasting of ED visit volume and could help improve quality and reduce cost.
Collapse
Affiliation(s)
- Sam Tideman
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Jonathan Bickel
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Ben Reis
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
- Predictive Medicine Group, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
36
|
Rangarajan P, Mody SK, Marathe M. Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data. PLoS Comput Biol 2019; 15:e1007518. [PMID: 31751346 PMCID: PMC6894887 DOI: 10.1371/journal.pcbi.1007518] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 12/05/2019] [Accepted: 10/29/2019] [Indexed: 12/20/2022] Open
Abstract
Dengue and influenza-like illness (ILI) are two of the leading causes of viral infection in the world and it is estimated that more than half the world’s population is at risk for developing these infections. It is therefore important to develop accurate methods for forecasting dengue and ILI incidences. Since data from multiple sources (such as dengue and ILI case counts, electronic health records and frequency of multiple internet search terms from Google Trends) can improve forecasts, standard time series analysis methods are inadequate to estimate all the parameter values from the limited amount of data available if we use multiple sources. In this paper, we use a computationally efficient implementation of the known variable selection method that we call the Autoregressive Likelihood Ratio (ARLR) method. This method combines sparse representation of time series data, electronic health records data (for ILI) and Google Trends data to forecast dengue and ILI incidences. This sparse representation method uses an algorithm that maximizes an appropriate likelihood ratio at every step. Using numerical experiments, we demonstrate that our method recovers the underlying sparse model much more accurately than the lasso method. We apply our method to dengue case count data from five countries/states: Brazil, Mexico, Singapore, Taiwan, and Thailand and to ILI case count data from the United States. Numerical experiments show that our method outperforms existing time series forecasting methods in forecasting the dengue and ILI case counts. In particular, our method gives a 18 percent forecast error reduction over a leading method that also uses data from multiple sources. It also performs better than other methods in predicting the peak value of the case count and the peak time. Dengue and influenza-like illness (ILI) are leading causes of viral infection in the world and hence it is important to develop accurate methods for forecasting their incidence. We use Autoregressive Likelihood Ratio method, which is a computationally efficient implementation of the variable selection method, in order to obtain a sparse (non-lasso) representation of time series, Google Trends and electronic health records (for ILI) data. This method is used to forecast dengue incidence in five countries/states and ILI incidence in USA. We show that this method outperforms existing time series methods in forecasting these diseases. The method is general and can also be used to forecast other diseases.
Collapse
Affiliation(s)
- Prashant Rangarajan
- Departments of Computer Science and Mathematics, Birla Institute of Technology and Science, Pilani, India
| | - Sandeep K. Mody
- Department of Mathematics, Indian Institute of Science, Bangalore, India
| | - Madhav Marathe
- Department of Computer Science, Network, Simulation Science and Advanced Computing Division, Biocomplexity Institute, University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail:
| |
Collapse
|
37
|
Chesnut M, Muñoz LS, Harris G, Freeman D, Gama L, Pardo CA, Pamies D. In vitro and in silico Models to Study Mosquito-Borne Flavivirus Neuropathogenesis, Prevention, and Treatment. Front Cell Infect Microbiol 2019; 9:223. [PMID: 31338335 PMCID: PMC6629778 DOI: 10.3389/fcimb.2019.00223] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 06/11/2019] [Indexed: 01/07/2023] Open
Abstract
Mosquito-borne flaviviruses can cause disease in the nervous system, resulting in a significant burden of morbidity and mortality. Disease models are necessary to understand neuropathogenesis and identify potential therapeutics and vaccines. Non-human primates have been used extensively but present major challenges. Advances have also been made toward the development of humanized mouse models, but these models still do not fully represent human pathophysiology. Recent developments in stem cell technology and cell culture techniques have allowed the development of more physiologically relevant human cell-based models. In silico modeling has also allowed researchers to identify and predict transmission patterns and discover potential vaccine and therapeutic candidates. This review summarizes the research on in vitro and in silico models used to study three mosquito-borne flaviviruses that cause neurological disease in humans: West Nile, Dengue, and Zika. We also propose a roadmap for 21st century research on mosquito-borne flavivirus neuropathogenesis, prevention, and treatment.
Collapse
Affiliation(s)
- Megan Chesnut
- Center for Alternatives to Animal Testing, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Laura S. Muñoz
- Division of Neuroimmunology, Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
- Neuroviruses Emerging in the Americas Study, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Georgina Harris
- Center for Alternatives to Animal Testing, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Dana Freeman
- Department of Environmental Health and Engineering, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Lucio Gama
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, Bethesda, MD, United States
| | - Carlos A. Pardo
- Division of Neuroimmunology, Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
- Neuroviruses Emerging in the Americas Study, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - David Pamies
- Center for Alternatives to Animal Testing, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
- Department of Physiology, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
38
|
Mendivelso Duarte FO, Robayo García A, Rodríguez Bedoya M, Suárez Rángel G. [Reporting of birth defects from the Zika outbreak in Colombia, 2015-2017Notificação de defeitos congênitos associados ao surto de vírus zika na Colômbia, 2015-2017]. Rev Panam Salud Publica 2019; 43:e38. [PMID: 31093262 PMCID: PMC6499088 DOI: 10.26633/rpsp.2019.38] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 02/11/2019] [Indexed: 11/24/2022] Open
Abstract
Objetivo. El brote por virus del Zika afectó a varios países tropicales durante 2015 y 2016. Esto obligó a crear estrategias de vigilancia intensificada de microcefalia y otros síndromes neurológicos. Se evaluó el efecto del brote por virus del Zika en la notificación de defectos congénitos en Colombia desde la perspectiva del sistema nacional de vigilancia. Métodos. Se analizó la notificación nacional de recién nacidos con diferentes defectos congénitos y se determinaron las variaciones en la notificación atribuidas a la epidemia mediante un modelo semiparamétrico denominado “diferencia en diferencias” (DID). Resultados. Un total de 18 234 casos por defectos congénitos fueron notificados en Colombia durante el período de estudio. La mayoría eran malformaciones congénitas (91,9%). El 82,3% se confirmó por diagnóstico clínico o nexo epidemiológico. En el caso de la microcefalia, se notificaron ocho casos nuevos por semana epidemiológica (coeficiente de notificación de casos [D] = 8,8; P = 0,000) y 32 casos por otras malformaciones congénitas anatómicas (D = 32,0; P = 0,000). El valor absoluto del estimador de diferencia en diferencias atribuido al brote por virus del Zika incrementó la notificación semanal de casos de microcefalia (DID = |-5,0|; P = 0,008) y malformaciones congénitas (DID = |-12,0|; P = 0,111). Conclusiones. El brote por virus del Zika incrementó la notificación de recién nacidos con microcefalia, pero sin ninguna variación significativa en la notificación de otras malformaciones y defectos congénitos funcionales de origen sensorial o metabólico en el sistema de vigilancia.
Collapse
Affiliation(s)
- Fredy Orlando Mendivelso Duarte
- Centro de Medicina Basada en la Evidencia Keralty Centro de Medicina Basada en la Evidencia Keralty Bogotá Colombia Centro de Medicina Basada en la Evidencia Keralty, Bogotá, Colombia
| | - Adriana Robayo García
- Programa de Entrenamiento en Epidemiología de Campo (FETP) del Instituto Nacional de Salud de Colombia Programa de Entrenamiento en Epidemiología de Campo (FETP) del Instituto Nacional de Salud de Colombia Colombia Colombia Programa de Entrenamiento en Epidemiología de Campo (FETP) del Instituto Nacional de Salud de Colombia, Colombia
| | - Milena Rodríguez Bedoya
- Fundación Universitaria Sanitas Fundación Universitaria Sanitas Bogotá Colombia Fundación Universitaria Sanitas, Bogotá, Colombia
| | - Gloria Suárez Rángel
- Programa de Entrenamiento en Epidemiología de Campo (FETP) del Instituto Nacional de Salud de Colombia Programa de Entrenamiento en Epidemiología de Campo (FETP) del Instituto Nacional de Salud de Colombia Colombia Colombia Programa de Entrenamiento en Epidemiología de Campo (FETP) del Instituto Nacional de Salud de Colombia, Colombia
| |
Collapse
|
39
|
Bartlow AW, Manore C, Xu C, Kaufeld KA, Del Valle S, Ziemann A, Fairchild G, Fair JM. Forecasting Zoonotic Infectious Disease Response to Climate Change: Mosquito Vectors and a Changing Environment. Vet Sci 2019; 6:E40. [PMID: 31064099 PMCID: PMC6632117 DOI: 10.3390/vetsci6020040] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Revised: 04/12/2019] [Accepted: 04/29/2019] [Indexed: 12/20/2022] Open
Abstract
Infectious diseases are changing due to the environment and altered interactions among hosts, reservoirs, vectors, and pathogens. This is particularly true for zoonotic diseases that infect humans, agricultural animals, and wildlife. Within the subset of zoonoses, vector-borne pathogens are changing more rapidly with climate change, and have a complex epidemiology, which may allow them to take advantage of a changing environment. Most mosquito-borne infectious diseases are transmitted by mosquitoes in three genera: Aedes, Anopheles, and Culex, and the expansion of these genera is well documented. There is an urgent need to study vector-borne diseases in response to climate change and to produce a generalizable approach capable of generating risk maps and forecasting outbreaks. Here, we provide a strategy for coupling climate and epidemiological models for zoonotic infectious diseases. We discuss the complexity and challenges of data and model fusion, baseline requirements for data, and animal and human population movement. Disease forecasting needs significant investment to build the infrastructure necessary to collect data about the environment, vectors, and hosts at all spatial and temporal resolutions. These investments can contribute to building a modeling community around the globe to support public health officials so as to reduce disease burden through forecasts with quantified uncertainty.
Collapse
Affiliation(s)
- Andrew W Bartlow
- Los Alamos National Laboratory, Biosecurity and Public Health, Los Alamos, NM 87545, USA.
| | - Carrie Manore
- Los Alamos National Laboratory, Information Systems and Modeling, Los Alamos, NM 87545, USA.
| | - Chonggang Xu
- Los Alamos National Laboratory, Earth Systems Observations, Los Alamos, NM 87545, USA.
| | - Kimberly A Kaufeld
- Los Alamos National Laboratory, Statistical Sciences, Los Alamos, NM 87545, USA.
| | - Sara Del Valle
- Los Alamos National Laboratory, Information Systems and Modeling, Los Alamos, NM 87545, USA.
| | - Amanda Ziemann
- Los Alamos National Laboratory, Space Data Science and Systems, Los Alamos, NM 87545, USA.
| | - Geoffrey Fairchild
- Los Alamos National Laboratory, Information Systems and Modeling, Los Alamos, NM 87545, USA.
| | - Jeanne M Fair
- Los Alamos National Laboratory, Biosecurity and Public Health, Los Alamos, NM 87545, USA.
| |
Collapse
|
40
|
Clemente L, Lu F, Santillana M. Improved Real-Time Influenza Surveillance: Using Internet Search Data in Eight Latin American Countries. JMIR Public Health Surveill 2019; 5:e12214. [PMID: 30946017 PMCID: PMC6470460 DOI: 10.2196/12214] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 02/11/2019] [Accepted: 02/15/2019] [Indexed: 01/18/2023] Open
Abstract
Background Novel influenza surveillance systems that leverage Internet-based real-time data sources including Internet search frequencies, social-network information, and crowd-sourced flu surveillance tools have shown improved accuracy over the past few years in data-rich countries like the United States. These systems not only track flu activity accurately, but they also report flu estimates a week or more ahead of the publication of reports produced by healthcare-based systems, such as those implemented and managed by the Centers for Disease Control and Prevention. Previous work has shown that the predictive capabilities of novel flu surveillance systems, like Google Flu Trends (GFT), in developing countries in Latin America have not yet delivered acceptable flu estimates. Objective The aim of this study was to show that recent methodological improvements on the use of Internet search engine information to track diseases can lead to improved retrospective flu estimates in multiple countries in Latin America. Methods A machine learning-based methodology that uses flu-related Internet search activity and historical information to monitor flu activity, named ARGO (AutoRegression with Google search), was extended to generate flu predictions for 8 Latin American countries (Argentina, Bolivia, Brazil, Chile, Mexico, Paraguay, Peru, and Uruguay) for the time period: January 2012 to December of 2016. These retrospective (out-of-sample) Influenza activity predictions were compared with historically observed flu suspected cases in each country, as reported by Flunet, an influenza surveillance database maintained by the World Health Organization. For a baseline comparison, retrospective (out-of-sample) flu estimates were produced for the same time period using autoregressive models that only leverage historical flu activity information. Results Our results show that ARGO-like models’ predictive power outperform autoregressive models in 6 out of 8 countries in the 2012-2016 time period. Moreover, ARGO significantly improves on historical flu estimates produced by the now discontinued GFT for the time period of 2012-2015, where GFT information is publicly available. Conclusions We demonstrate here that a self-correcting machine learning method, leveraging Internet-based disease-related search activity and historical flu trends, has the potential to produce reliable and timely flu estimates in multiple Latin American countries. This methodology may prove helpful to local public health officials who design and implement interventions aimed at mitigating the effects of influenza outbreaks. Our methodology generally outperforms both the now-discontinued tool GFT, and autoregressive methodologies that exploit only historical flu activity to produce future disease estimates.
Collapse
Affiliation(s)
- Leonardo Clemente
- School of Engineering and Sciences, Tecnologico de Monterrey, Monterrey, Mexico.,Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
| | - Fred Lu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
41
|
Ning S, Yang S, Kou SC. Accurate regional influenza epidemics tracking using Internet search data. Sci Rep 2019; 9:5238. [PMID: 30918276 PMCID: PMC6437143 DOI: 10.1038/s41598-019-41559-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 03/12/2019] [Indexed: 12/12/2022] Open
Abstract
Accurate, high-resolution tracking of influenza epidemics at the regional level helps public health agencies make informed and proactive decisions, especially in the face of outbreaks. Internet users' online searches offer great potential for the regional tracking of influenza. However, due to the complex data structure and reduced quality of Internet data at the regional level, few established methods provide satisfactory performance. In this article, we propose a novel method named ARGO2 (2-step Augmented Regression with GOogle data) that efficiently combines publicly available Google search data at different resolutions (national and regional) with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) for accurate, real-time regional tracking of influenza. ARGO2 gives very competitive performance across all US regions compared with available Internet-data-based regional influenza tracking methods, and it has achieved 30% error reduction over the best alternative method that we numerically tested for the period of March 2009 to March 2018. ARGO2 is reliable and robust, with the flexibility to incorporate additional information from other sources and resolutions, making it a powerful tool for regional influenza tracking, and potentially for tracking other social, economic, or public health events at the regional or local level.
Collapse
Affiliation(s)
- Shaoyang Ning
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA
| | - Shihao Yang
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA
| | - S C Kou
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA.
| |
Collapse
|
42
|
Lu FS, Hattab MW, Clemente CL, Biggerstaff M, Santillana M. Improved state-level influenza nowcasting in the United States leveraging Internet-based data and network approaches. Nat Commun 2019; 10:147. [PMID: 30635558 PMCID: PMC6329822 DOI: 10.1038/s41467-018-08082-0] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 12/12/2018] [Indexed: 12/01/2022] Open
Abstract
In the presence of health threats, precision public health approaches aim to provide targeted, timely, and population-specific interventions. Accurate surveillance methodologies that can estimate infectious disease activity ahead of official healthcare-based reports, at relevant spatial resolutions, are important for achieving this goal. Here we introduce a methodological framework which dynamically combines two distinct influenza tracking techniques, using an ensemble machine learning approach, to achieve improved state-level influenza activity estimates in the United States. The two predictive techniques behind the ensemble utilize (1) a self-correcting statistical method combining influenza-related Google search frequencies, information from electronic health records, and historical flu trends within each state, and (2) a network-based approach leveraging spatio-temporal synchronicities observed in historical influenza activity across states. The ensemble considerably outperforms each component method in addition to previously proposed state-specific methods for influenza tracking, with higher correlations and lower prediction errors.
Collapse
Affiliation(s)
- Fred S Lu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA.
| | - Mohammad W Hattab
- Wyss Institute for Biologically Inspired Engineering, Harvard Medical School, Boston, MA, 02115, USA
| | | | - Matthew Biggerstaff
- Influenza Division, National Center for Immunization and Respiratory Disease, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA.
- Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
43
|
Gunn LH, Ter Horst E, Markossian TW, Molina G. Online interest regarding violent attacks, gun control, and gun purchase: A causal analysis. PLoS One 2018; 13:e0207924. [PMID: 30485315 PMCID: PMC6261600 DOI: 10.1371/journal.pone.0207924] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Accepted: 11/06/2018] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Increased interest about gun ownership and gun control are oftentimes driven by informational shocks in a common factor, namely violent attacks, and the perceived need for higher levels of safety. A causal depiction of the societal interest around violent attacks, gun control and gun purchase, both synchronous and over time, should be a stepping stone for designing future strategies regarding the safety concerns of the U.S. population. OBJECTIVE Examine the causal relationships between unexpected increases in population interest about violent attacks, gun control, and gun purchase. METHODS Relationships among online searches for information about violent attacks, gun control, and gun purchase occurring between 2004 and 2017 in the U.S. are explained through a novel structural vector autoregressive time series model to account for simultaneous causal relationships. RESULTS More than 20% of the stationary variability in each of gun control and gun purchase interest can be explained by the remaining factors. Gun control interest appears to be caused, in part, by violent attacks informational shocks, yet violent attacks, although impactful, have a lesser effect than gun control debate on long-term gun ownership interests. CONCLUSIONS The form in which gun control has been introduced in public debate may have further increased gun ownership interest. Reactive gun purchase interest may be an unintended side effect of gun control debate. U.S. policymakers may need to rethink current approaches to promotion of gun control, and whether societal policy debate without policy outcomes could be having unintended effects.
Collapse
Affiliation(s)
- Laura H Gunn
- Department of Public Health Sciences, Health Informatics and Analytics Program, University of North Carolina at Charlotte, Charlotte, NC, United States of America.,School of Public Health, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Enrique Ter Horst
- Universidad de los Andes (Uniandes), Facultad de administracion, Bogota, Colombia
| | - Talar W Markossian
- Department of Public Health Sciences, Stritch School of Medicine, Loyola University Chicago, Chicago, Illinois, United States of America
| | - German Molina
- Quantitative Research, Idalion Capital Group, London, United Kingdom
| |
Collapse
|
44
|
Lilford R, Taiwo OJ, de Albuquerque JP. Characterisation of urban spaces from space: going beyond the urban versus rural dichotomy. LANCET PUBLIC HEALTH 2018; 3:e61-e62. [PMID: 29422187 DOI: 10.1016/s2468-2667(18)30008-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Accepted: 01/03/2018] [Indexed: 11/24/2022]
Affiliation(s)
- Richard Lilford
- Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK
| | - Olalekan John Taiwo
- Department of Geography, Faculty of the Social Sciences, University of Ibadan, Ibadan, Nigeria
| | | |
Collapse
|
45
|
Global Research on Syndromic Surveillance from 1993 to 2017: Bibliometric Analysis and Visualization. SUSTAINABILITY 2018. [DOI: 10.3390/su10103414] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Syndromic Surveillance aims at analyzing medical data to detect clusters of illness or forecast disease outbreaks. Although the research in this field is flourishing in terms of publications, an insight of the global research output has been overlooked. This paper aims at analyzing the global scientific output of the research from 1993 to 2017. To this end, the paper uses bibliometric analysis and visualization to achieve its goal. Particularly, a data processing framework was proposed based on citation datasets collected from Scopus and Clarivate Analytics’ Web of Science Core Collection (WoSCC). The bibliometric method and Citespace were used to analyze the institutions, countries, and research areas as well as the current hotspots and trends. The preprocessed dataset includes 14,680 citation records. The analysis uncovered USA, England, Canada, France and Australia as the top five most productive countries publishing about Syndromic Surveillance. On the other hand, at the Pinnacle of academic institutions are the US Centers for Disease Control and Prevention (CDC). The reference co-citation analysis uncovered the common research venues and further analysis of the keyword cooccurrence revealed the most trending topics. The findings of this research will help in enriching the field with a comprehensive view of the status and future trends of the research on Syndromic Surveillance.
Collapse
|
46
|
Google searches do not correlate with melanoma incidence in majority English speaking countries. NPJ Digit Med 2018; 1:44. [PMID: 31304324 PMCID: PMC6550200 DOI: 10.1038/s41746-018-0050-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 08/07/2018] [Accepted: 08/08/2018] [Indexed: 11/21/2022] Open
Abstract
Recent reports have suggested that internet search behaviour may be a valuable tool to estimate melanoma incidence and mortality. Previous studies have used incorrect statistical methods, were focussed on the United States and/or did not use non-cancer control search terms to provide a context for interpreting the effects seen with the cancer-related terms. Using more robust statistical methods we found that no cancer search terms were significantly, or strongly correlated with melanoma incidence in 6 countries.
Collapse
|
47
|
Open data mining for Taiwan's dengue epidemic. Acta Trop 2018; 183:1-7. [PMID: 29549012 DOI: 10.1016/j.actatropica.2018.03.017] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2017] [Revised: 02/19/2018] [Accepted: 03/10/2018] [Indexed: 11/22/2022]
Abstract
By using a quantitative approach, this study examines the applicability of data mining technique to discover knowledge from open data related to Taiwan's dengue epidemic. We compare results when Google trend data are included or excluded. Data sources are government open data, climate data, and Google trend data. Research findings from analysis of 70,914 cases are obtained. Location and time (month) in open data show the highest classification power followed by climate variables (temperature and humidity), whereas gender and age show the lowest values. Both prediction accuracy and simplicity decrease when Google trends are considered (respectively 0.94 and 0.37, compared to 0.96 and 0.46). The article demonstrates the value of open data mining in the context of public health care.
Collapse
|
48
|
Keuschnigg M, Lovsjö N, Hedström P. Analytical sociology and computational social science. JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE 2018; 1:3-14. [PMID: 31930176 PMCID: PMC6936355 DOI: 10.1007/s42001-017-0006-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2017] [Accepted: 11/09/2017] [Indexed: 05/18/2023]
Abstract
Analytical sociology focuses on social interactions among individuals and the hard-to-predict aggregate outcomes they bring about. It seeks to identify generalizable mechanisms giving rise to emergent properties of social systems which, in turn, feed back on individual decision-making. This research program benefits from computational tools such as agent-based simulations, machine learning, and large-scale web experiments, and has considerable overlap with the nascent field of computational social science. By providing relevant analytical tools to rigorously address sociology's core questions, computational social science has the potential to advance sociology in a similar way that the introduction of econometrics advanced economics during the last half century. Computational social scientists from computer science and physics often see as their main task to establish empirical regularities which they view as "social laws." From the perspective of the social sciences, references to social laws appear unfounded and misplaced, however, and in this article we outline how analytical sociology, with its theory-grounded approach to computational social science, can help to move the field forward from mere descriptions and predictions to the explanation of social phenomena.
Collapse
Affiliation(s)
- Marc Keuschnigg
- The Institute for Analytical Sociology, Linköping University, Norra Grytsgatan 10, 601 74 Norrköping, Sweden
| | - Niclas Lovsjö
- The Institute for Analytical Sociology, Linköping University, Norra Grytsgatan 10, 601 74 Norrköping, Sweden
| | - Peter Hedström
- The Institute for Analytical Sociology, Linköping University, Norra Grytsgatan 10, 601 74 Norrköping, Sweden
| |
Collapse
|
49
|
Pollett S, Althouse BM, Forshey B, Rutherford GW, Jarman RG. Internet-based biosurveillance methods for vector-borne diseases: Are they novel public health tools or just novelties? PLoS Negl Trop Dis 2017; 11:e0005871. [PMID: 29190281 PMCID: PMC5708615 DOI: 10.1371/journal.pntd.0005871] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Internet-based surveillance methods for vector-borne diseases (VBDs) using "big data" sources such as Google, Twitter, and internet newswire scraping have recently been developed, yet reviews on such "digital disease detection" methods have focused on respiratory pathogens, particularly in high-income regions. Here, we present a narrative review of the literature that has examined the performance of internet-based biosurveillance for diseases caused by vector-borne viruses, parasites, and other pathogens, including Zika, dengue, other arthropod-borne viruses, malaria, leishmaniasis, and Lyme disease across a range of settings, including low- and middle-income countries. The fundamental features, advantages, and drawbacks of each internet big data source are presented for those with varying familiarity of "digital epidemiology." We conclude with some of the challenges and future directions in using internet-based biosurveillance for the surveillance and control of VBD.
Collapse
Affiliation(s)
- Simon Pollett
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Global Health Sciences, University of California, San Francisco, San Francisco, California, United States of America
- Marie Bashir Institute, University of Sydney, NSW, Australia
- * E-mail:
| | - Benjamin M. Althouse
- Institute for Disease Modeling, Bellevue, Washington, United States of America
- Information School, University of Washington, Seattle, Washington, United States of America
- Department of Biology, New Mexico State University, Las Cruces, New Mexico, United States of America
| | - Brett Forshey
- Global Emerging Infections Surveillance Section, Armed Force Health Surveillance Branch, Silver Spring, Maryland, United States of America
- Cherokee Nation Technology Solutions, Silver Spring, Maryland, United States of America
| | - George W. Rutherford
- Global Health Sciences, University of California, San Francisco, San Francisco, California, United States of America
| | - Richard G. Jarman
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
| |
Collapse
|