1
|
Paradise Vit A, Magid A. Differences in Fear and Negativity Levels Between Formal and Informal Health-Related Websites: Analysis of Sentiments and Emotions. J Med Internet Res 2024; 26:e55151. [PMID: 39120928 PMCID: PMC11344190 DOI: 10.2196/55151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 05/19/2024] [Accepted: 06/07/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Searching for web-based health-related information is frequently performed by the public and may affect public behavior regarding health decision-making. Particularly, it may result in anxiety, erroneous, and harmful self-diagnosis. Most searched health-related topics are cancer, cardiovascular diseases, and infectious diseases. A health-related web-based search may result in either formal or informal medical website, both of which may evoke feelings of fear and negativity. OBJECTIVE Our study aimed to assess whether there is a difference in fear and negativity levels between information appearing on formal and informal health-related websites. METHODS A web search was performed to retrieve the contents of websites containing symptoms of selected diseases, using selected common symptoms. Retrieved websites were classified into formal and informal websites. Fear and negativity of each content were evaluated using 3 transformer models. A fourth transformer model was fine-tuned using an existing emotion data set obtained from a web-based health community. For formal and informal websites, fear and negativity levels were aggregated. t tests were conducted to evaluate the differences in fear and negativity levels between formal and informal websites. RESULTS In this study, unique websites (N=1448) were collected, of which 534 were considered formal and 914 were considered informal. There were 1820 result pages from formal websites and 1494 result pages from informal websites. According to our findings, fear levels were statistically higher (t2753=3.331; P<.001) on formal websites (mean 0.388, SD 0.177) than on informal websites (mean 0.366, SD 0.168). The results also show that the level of negativity was statistically higher (t2753=2.726; P=.006) on formal websites (mean 0.657, SD 0.211) than on informal websites (mean 0.636, SD 0.201). CONCLUSIONS Positive texts may increase the credibility of formal health websites and increase their usage by the general public and the public's compliance to the recommendations. Increasing the usage of natural language processing tools before publishing health-related information to achieve a more positive and less stressful text to be disseminated to the public is recommended.
Collapse
Affiliation(s)
- Abigail Paradise Vit
- Department of Information Systems, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Avi Magid
- Department of Information Systems, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
- Management, Rambam Health Care Campus, Haifa, Israel
| |
Collapse
|
2
|
Wei S, Lin S, Wenjing Z, Shaoxia S, Yuejie Y, Yujie H, Shu Z, Zhong L, Ti L. The prediction of influenza-like illness using national influenza surveillance data and Baidu query data. BMC Public Health 2024; 24:513. [PMID: 38369456 PMCID: PMC10875817 DOI: 10.1186/s12889-024-17978-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 02/04/2024] [Indexed: 02/20/2024] Open
Abstract
BACKGROUND Seasonal influenza and other respiratory tract infections are serious public health problems that need to be further addressed and investigated. Internet search data are recognized as a valuable source for forecasting influenza or other respiratory tract infection epidemics. However, the selection of internet search data and the application of forecasting methods are important for improving forecasting accuracy. The aim of the present study was to forecast influenza epidemics based on the long short-term memory neural network (LSTM) method, Baidu search index data, and the influenza-like-illness (ILI) rate. METHODS The official weekly ILI% data for northern and southern mainland China were obtained from the Chinese Influenza Center from 2018 to 2021. Based on the Baidu Index, search indices related to influenza infection over the corresponding time period were obtained. Pearson correlation analysis was performed to explore the association between influenza-related search queries and the ILI% of southern and northern mainland China. The LSTM model was used to forecast the influenza epidemic within the same week and at lags of 1-4 weeks. The model performance was assessed by evaluation metrics, including the mean square error (MSE), root mean square error (RMSE) and mean absolute error (MAE). RESULTS In total, 24 search queries in northern mainland China and 7 search queries in southern mainland China were found to be correlated and were used to construct the LSTM model, which included the same week and a lag of 1-4 weeks. The LSTM model showed that ILI% + mask with one lag week and ILI% + influenza name were good prediction modules, with reduced RMSE predictions of 16.75% and 4.20%, respectively, compared with the estimated ILI% for northern and southern mainland China. CONCLUSIONS The results illuminate the feasibility of using an internet search index as a complementary data source for influenza forecasting and the efficiency of using the LSTM model to forecast influenza epidemics.
Collapse
Affiliation(s)
- Su Wei
- School of Management Science and Engineering, Shandong University of Finance and Economics, Jinan, Shandong, 250014, People's Republic of China.
| | - Sun Lin
- Shandong Center for Disease Control and Prevention, Shandong Provincial Key Laboratory of Infectious Disease Control and Prevention, Shandong University Institution for Prevention Medicine, Jinan, Shandong, 250014, People's Republic of China
| | - Zhao Wenjing
- Dezhou Center for Disease Control and Prevention, Dezhou, Shandong, 253000, People's Republic of China
| | - Song Shaoxia
- Shandong Center for Disease Control and Prevention, Shandong Provincial Key Laboratory of Infectious Disease Control and Prevention, Shandong University Institution for Prevention Medicine, Jinan, Shandong, 250014, People's Republic of China
| | - Yang Yuejie
- China Institute of Water Resources and Hydropower Research, Beijing, 100038, People's Republic of China
| | - He Yujie
- Shandong Center for Disease Control and Prevention, Shandong Provincial Key Laboratory of Infectious Disease Control and Prevention, Shandong University Institution for Prevention Medicine, Jinan, Shandong, 250014, People's Republic of China
| | - Zhang Shu
- Shandong Center for Disease Control and Prevention, Shandong Provincial Key Laboratory of Infectious Disease Control and Prevention, Shandong University Institution for Prevention Medicine, Jinan, Shandong, 250014, People's Republic of China
| | - Li Zhong
- Shandong Center for Disease Control and Prevention, Shandong Provincial Key Laboratory of Infectious Disease Control and Prevention, Shandong University Institution for Prevention Medicine, Jinan, Shandong, 250014, People's Republic of China
| | - Liu Ti
- Shandong Center for Disease Control and Prevention, Shandong Provincial Key Laboratory of Infectious Disease Control and Prevention, Shandong University Institution for Prevention Medicine, Jinan, Shandong, 250014, People's Republic of China.
| |
Collapse
|
3
|
Yang L, Zhang T, Han X, Yang J, Sun Y, Ma L, Chen J, Li Y, Lai S, Li W, Feng L, Yang W. Influenza Epidemic Trend Surveillance and Prediction Based on Search Engine Data: Deep Learning Model Study. J Med Internet Res 2023; 25:e45085. [PMID: 37847532 PMCID: PMC10618884 DOI: 10.2196/45085] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 07/24/2023] [Accepted: 08/04/2023] [Indexed: 10/18/2023] Open
Abstract
BACKGROUND Influenza outbreaks pose a significant threat to global public health. Traditional surveillance systems and simple algorithms often struggle to predict influenza outbreaks in an accurate and timely manner. Big data and modern technology have offered new modalities for disease surveillance and prediction. Influenza-like illness can serve as a valuable surveillance tool for emerging respiratory infectious diseases like influenza and COVID-19, especially when reported case data may not fully reflect the actual epidemic curve. OBJECTIVE This study aimed to develop a predictive model for influenza outbreaks by combining Baidu search query data with traditional virological surveillance data. The goal was to improve early detection and preparedness for influenza outbreaks in both northern and southern China, providing evidence for supplementing modern intelligence epidemic surveillance methods. METHODS We collected virological data from the National Influenza Surveillance Network and Baidu search query data from January 2011 to July 2018, totaling 3,691,865 and 1,563,361 respective samples. Relevant search terms related to influenza were identified and analyzed for their correlation with influenza-positive rates using Pearson correlation analysis. A distributed lag nonlinear model was used to assess the lag correlation of the search terms with influenza activity. Subsequently, a predictive model based on the gated recurrent unit and multiple attention mechanisms was developed to forecast the influenza-positive trend. RESULTS This study revealed a high correlation between specific Baidu search terms and influenza-positive rates in both northern and southern China, except for 1 term. The search terms were categorized into 4 groups: essential facts on influenza, influenza symptoms, influenza treatment and medicine, and influenza prevention, all of which showed correlation with the influenza-positive rate. The influenza prevention and influenza symptom groups had a lag correlation of 1.4-3.2 and 5.0-8.0 days, respectively. The Baidu search terms could help predict the influenza-positive rate 14-22 days in advance in southern China but interfered with influenza surveillance in northern China. CONCLUSIONS Complementing traditional disease surveillance systems with information from web-based data sources can aid in detecting warning signs of influenza outbreaks earlier. However, supplementation of modern surveillance with search engine information should be approached cautiously. This approach provides valuable insights for digital epidemiology and has the potential for broader application in respiratory infectious disease surveillance. Further research should explore the optimization and customization of search terms for different regions and languages to improve the accuracy of influenza prediction models.
Collapse
Affiliation(s)
- Liuyang Yang
- Department of Management Science and Information System, Faculty of Management and Economics, Kunming University of Science and Technology, Kunming, China
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Ting Zhang
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xuan Han
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Jiao Yang
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Yanxia Sun
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Libing Ma
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guilin Medical University, Guilin, China
| | - Jialong Chen
- Department of Respiratory and Critical Care Medicine, Bejing Hospital, Beijing, China
| | - Yanming Li
- Department of Respiratory and Critical Care Medicine, Bejing Hospital, Beijing, China
| | - Shengjie Lai
- WorldPop, School of Geography and Environmental Science, University of Southampton, Southampton, United Kingdom
| | - Wei Li
- The First People's Hospital of Yunnan Province, Affiliated Hospital of Kunming University of Science and Technology, Kunming, China
| | - Luzhao Feng
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Weizhong Yang
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| |
Collapse
|
4
|
Jang B, Kim I, Kim JW. Long-Term Influenza Outbreak Forecast Using Time-Precedence Correlation of Web Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2400-2412. [PMID: 34469319 DOI: 10.1109/tnnls.2021.3106637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Influenza leads to many deaths every year and is a threat to human health. For effective prevention, traditional national-scale statistical surveillance systems have been developed, and numerous studies have been conducted to predict influenza outbreaks using web data. Most studies have captured the short-term signs of influenza outbreaks, such as one-week prediction using the characteristics of web data uploaded in real time; however, long-term predictions of more than 2-10 weeks are required to effectively cope with influenza outbreaks. In this study, we determined that web data uploaded in real time have a time-precedence relationship with influenza outbreaks. For example, a few weeks before an influenza pandemic, the word "colds" appears frequently in web data. The web data after the appearance of the word "colds" can be used as information for forecasting future influenza outbreaks, which can improve long-term influenza prediction accuracy. In this study, we propose a novel long-term influenza outbreak forecast model utilizing the time precedence between the emergence of web data and an influenza outbreak. Based on the proposed model, we conducted experiments on: 1) selecting suitable web data for long-term influenza prediction; 2) determining whether the proposed model is regionally dependent; and 3) evaluating the accuracy according to the prediction timeframe. The proposed model showed a correlation of 0.87 in the long-term prediction of ten weeks while significantly outperforming other state-of-the-art methods.
Collapse
|
5
|
Saegner T, Austys D. Forecasting and Surveillance of COVID-19 Spread Using Google Trends: Literature Review. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:12394. [PMID: 36231693 PMCID: PMC9566212 DOI: 10.3390/ijerph191912394] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/23/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
The probability of future Coronavirus Disease (COVID)-19 waves remains high, thus COVID-19 surveillance and forecasting remains important. Online search engines harvest vast amounts of data from the general population in real time and make these data publicly accessible via such tools as Google Trends (GT). Therefore, the aim of this study was to review the literature about possible use of GT for COVID-19 surveillance and prediction of its outbreaks. We collected and reviewed articles about the possible use of GT for COVID-19 surveillance published in the first 2 years of the pandemic. We resulted in 54 publications that were used in this review. The majority of the studies (83.3%) included in this review showed positive results of the possible use of GT for forecasting COVID-19 outbreaks. Most of the studies were performed in English-speaking countries (61.1%). The most frequently used keyword was "coronavirus" (53.7%), followed by "COVID-19" (31.5%) and "COVID" (20.4%). Many authors have made analyses in multiple countries (46.3%) and obtained the same results for the majority of them, thus showing the robustness of the chosen methods. Various methods including long short-term memory (3.7%), random forest regression (3.7%), Adaboost algorithm (1.9%), autoregressive integrated moving average, neural network autoregression (1.9%), and vector error correction modeling (1.9%) were used for the analysis. It was seen that most of the publications with positive results (72.2%) were using data from the first wave of the COVID-19 pandemic. Later, the search volumes reduced even though the incidence peaked. In most countries, the use of GT data showed to be beneficial for forecasting and surveillance of COVID-19 spread.
Collapse
Affiliation(s)
- Tobias Saegner
- Department of Public Health, Institute of Health Sciences, Faculty of Medicine, Vilnius University, M. K. Čiurlionio 21/27, LT-03101 Vilnius, Lithuania
| | | |
Collapse
|
6
|
Early Warning of Infectious Diseases in Hospitals Based on Multi-Self-Regression Deep Neural Network. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:8990907. [PMID: 36032546 PMCID: PMC9410942 DOI: 10.1155/2022/8990907] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 07/11/2022] [Indexed: 11/17/2022]
Abstract
Objective. Infectious diseases usually spread rapidly. This study aims to develop a model that can provide fine-grained early warnings of infectious diseases using real hospital data combined with disease transmission characteristics, weather, and other multi-source data. Methods. Based on daily data reported for infectious diseases collected from several large general hospitals in China between 2012 and 2020, seven common infectious diseases in medical institutions were screened and a multi self-regression deep (MSRD) neural network was constructed. Using a recurrent neural network as the basic structure, the model can effectively model the epidemiological trend of infectious diseases by considering the current influencing conditions while taking into account the historical development characteristics in time-series data. The fitting and prediction accuracy of the model were evaluated using mean absolute error (MAE) and root mean squared error. Results. The proposed approach is significantly better than the existing infectious disease dynamics model, susceptible-exposed-infected-removed (SEIR), as it addresses the concerns of difficult-to-obtain quantitative data such as latent population, overfitting of long time series, and considering only a single series of the number of sick people without considering the epidemiological characteristics of infectious diseases. We also compare certain machine learning methods in this study. Experimental results demonstrate that the proposed approach achieves an MAE of 0.6928 and 1.3782 for hand, foot, and mouth disease and influenza, respectively. Conclusion. The MRSD-based infectious disease prediction model proposed in this paper can provide daily and instantaneous updates and accurate predictions for epidemic trends.
Collapse
|
7
|
Uda K, Hagiya H, Yorifuji T, Koyama T, Tsuge M, Yashiro M, Tsukahara H. Correlation between national surveillance and search engine query data on respiratory syncytial virus infections in Japan. BMC Public Health 2022; 22:1517. [PMID: 35945532 PMCID: PMC9363139 DOI: 10.1186/s12889-022-13899-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Accepted: 07/25/2022] [Indexed: 11/10/2022] Open
Abstract
Background The respiratory syncytial virus (RSV) disease burden is significant, especially in infants and children with an underlying disease. Prophylaxis with palivizumab is recommended for these high-risk groups. Early recognition of a RSV epidemic is important for timely administration of palivizumab. We herein aimed to assess the correlation between national surveillance and Google Trends data pertaining to RSV infections in Japan. Methods The present, retrospective survey was performed between January 1, 2018 and November 14, 2021 and evaluated the correlation between national surveillance data and Google Trends data. Joinpoint regression was used to identify the points at which changes in trends occurred. Results A strong correlation was observed every study year (2018 [r = 0.87, p < 0.01], 2019 [r = 0.83, p < 0.01], 2020 [r = 0.83, p < 0.01], and 2021 [r = 0.96, p < 0.01]). The change-points in the Google Trends data indicating the start of the RSV epidemic were observed earlier than by sentinel surveillance in 2018 and 2021 and simultaneously with sentinel surveillance in 2019. No epidemic surge was observed in either the Google Trends or the surveillance data from 2020. Conclusions Our data suggested that Google Trends has the potential to enable the early identification of RSV epidemics. In countries without a national surveillance system, Google Trends may serve as an alternative early warning system. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-022-13899-y.
Collapse
Affiliation(s)
- Kazuhiro Uda
- Department of Pediatrics, Okayama University Graduate School of Medicine, Dentistry, and Pharmaceutical Sciences, 2-5-1 Shikata, Okayama, 700-8558, Japan. .,Department of Pediatrics, Okayama University Hospital, 2-5-1 Shikata, Okayama, 700-8558, Japan.
| | - Hideharu Hagiya
- Department of General Medicine, Okayama University Graduate School of Medicine, Dentistry, and Pharmaceutical Science, 2-5-1 Shikata, Okayama, 700-8558, Japan
| | - Takashi Yorifuji
- Department of Epidemiology, Graduate School of Medicine, Dentistry, and Pharmaceutical Sciences, Okayama University, 2-5-1 Shikata, Okayama, 700-8558, Japan
| | - Toshihiro Koyama
- Department of Health Data Science, Graduate School of Medicine, Dentistry, and Pharmaceutical Sciences, Okayama University, 2-5-1 Shikata, Okayama, 700-8558, Japan
| | - Mitsuru Tsuge
- Department of Pediatrics Acute Diseases, Okayama University Academic Field of Medicine, Dentistry, and Pharmaceutical Science, 2-5-1 Shikata, Okayama, 700-8558, Japan
| | - Masato Yashiro
- Department of Pediatrics, Okayama University Hospital, 2-5-1 Shikata, Okayama, 700-8558, Japan
| | - Hirokazu Tsukahara
- Department of Pediatrics, Okayama University Graduate School of Medicine, Dentistry, and Pharmaceutical Sciences, 2-5-1 Shikata, Okayama, 700-8558, Japan
| |
Collapse
|
8
|
Cawley C, Bergey F, Mehl A, Finckh A, Gilsdorf A. Novel Methods in the Surveillance of Influenza-Like Illness in Germany Using Data From a Symptom Assessment App (Ada): Observational Case Study. JMIR Public Health Surveill 2021; 7:e26523. [PMID: 34734836 PMCID: PMC8722671 DOI: 10.2196/26523] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 08/04/2021] [Accepted: 08/16/2021] [Indexed: 11/13/2022] Open
Abstract
Background Participatory epidemiology is an emerging field harnessing consumer data entries of symptoms. The free app Ada allows users to enter the symptoms they are experiencing and applies a probabilistic reasoning model to provide a list of possible causes for these symptoms. Objective The objective of our study is to explore the potential contribution of Ada data to syndromic surveillance by comparing symptoms of influenza-like illness (ILI) entered by Ada users in Germany with data from a national population-based reporting system called GrippeWeb. Methods We extracted data for all assessments performed by Ada users in Germany over 3 seasons (2017/18, 2018/19, and 2019/20) and identified those with ILI (report of fever with cough or sore throat). The weekly proportion of assessments in which ILI was reported was calculated (overall and stratified by age group), standardized for the German population, and compared with trends in ILI rates reported by GrippeWeb using time series graphs, scatterplots, and Pearson correlation coefficient. Results In total, 2.1 million Ada assessments (for any symptoms) were included. Within seasons and across age groups, the Ada data broadly replicated trends in estimated weekly ILI rates when compared with GrippeWeb data (Pearson correlation—2017-18: r=0.86, 95% CI 0.76-0.92; P<.001; 2018-19: r=0.90, 95% CI 0.84-0.94; P<.001; 2019-20: r=0.64, 95% CI 0.44-0.78; P<.001). However, there were differences in the exact timing and nature of the epidemic curves between years. Conclusions With careful interpretation, Ada data could contribute to identifying broad ILI trends in countries without existing population-based monitoring systems or to the syndromic surveillance of symptoms not covered by existing systems.
Collapse
|
9
|
Mukka M, Pesälä S, Hammer C, Mustonen P, Jormanainen V, Pelttari H, Kaila M, Helve O. Analyzing citizens' and healthcare professionals' searches for smell/taste disorders and coronavirus in Finland during the COVID-19 pandemic: Infodemiological approach using database logs. JMIR Public Health Surveill 2021; 7:e31961. [PMID: 34727525 PMCID: PMC8653973 DOI: 10.2196/31961] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2021] [Revised: 10/12/2021] [Accepted: 10/28/2021] [Indexed: 11/25/2022] Open
Abstract
Background The COVID-19 pandemic has prevailed over a year, and log and register data on coronavirus have been utilized to establish models for detecting the pandemic. However, many sources contain unreliable health information on COVID-19 and its symptoms, and platforms cannot characterize the users performing searches. Prior studies have assessed symptom searches from general search engines (Google/Google Trends). Little is known about how modeling log data on smell/taste disorders and coronavirus from the dedicated internet databases used by citizens and health care professionals (HCPs) could enhance disease surveillance. Our material and method provide a novel approach to analyze web-based information seeking to detect infectious disease outbreaks. Objective The aim of this study was (1) to assess whether citizens’ and professionals’ searches for smell/taste disorders and coronavirus relate to epidemiological data on COVID-19 cases, and (2) to test our negative binomial regression modeling (ie, whether the inclusion of the case count could improve the model). Methods We collected weekly log data on searches related to COVID-19 (smell/taste disorders, coronavirus) between December 30, 2019, and November 30, 2020 (49 weeks). Two major medical internet databases in Finland were used: Health Library (HL), a free portal aimed at citizens, and Physician’s Database (PD), a database widely used among HCPs. Log data from databases were combined with register data on the numbers of COVID-19 cases reported in the Finnish National Infectious Diseases Register. We used negative binomial regression modeling to assess whether the case numbers could explain some of the dynamics of searches when plotting database logs. Results We found that coronavirus searches drastically increased in HL (0 to 744,113) and PD (4 to 5375) prior to the first wave of COVID-19 cases between December 2019 and March 2020. Searches for smell disorders in HL doubled from the end of December 2019 to the end of March 2020 (2148 to 4195), and searches for taste disorders in HL increased from mid-May to the end of November (0 to 1980). Case numbers were significantly associated with smell disorders (P<.001) and taste disorders (P<.001) in HL, and with coronavirus searches (P<.001) in PD. We could not identify any other associations between case numbers and searches in either database. Conclusions Novel infodemiological approaches could be used in analyzing database logs. Modeling log data from web-based sources was seen to improve the model only occasionally. However, search behaviors among citizens and professionals could be used as a supplementary source of information for infectious disease surveillance. Further research is needed to apply statistical models to log data of the dedicated medical databases.
Collapse
Affiliation(s)
- Milla Mukka
- University of Helsinki, Tukholmankatu 8B, Helsinki, FI
| | - Samuli Pesälä
- University of Helsinki, Tukholmankatu 8B, Helsinki, FI.,Epidemiological Operations Unit, Helsinki, FI
| | - Charlotte Hammer
- European Programme for Intervention Epidemiology training, Solna, SE.,Department of Health Security, Finnish Institute for Health and Welfare, Helsinki, FI
| | | | - Vesa Jormanainen
- University of Helsinki, Tukholmankatu 8B, Helsinki, FI.,Finnish Institute for Health and Welfare, Helsinki, FI
| | | | - Minna Kaila
- Clinicum, University of Helsinki, Helsinki, FI
| | - Otto Helve
- Department of Health Security, Finnish Institute for Health and Welfare, Helsinki, FI.,Children's Hospital, Pediatric Research Center, University of Helsinki & Helsinki University Hospital, Helsinki, FI
| |
Collapse
|
10
|
Tozzi AE, Gesualdo F, Urbani E, Sbenaglia A, Ascione R, Procopio N, Croci I, Rizzo C. Digital Surveillance Through an Online Decision Support Tool for COVID-19 Over One Year of the Pandemic in Italy: Observational Study. J Med Internet Res 2021; 23:e29556. [PMID: 34292866 PMCID: PMC8366755 DOI: 10.2196/29556] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 07/14/2021] [Accepted: 07/18/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Italy has experienced severe consequences (ie, hospitalizations and deaths) during the COVID-19 pandemic. Online decision support systems (DSS) and self-triage applications have been used in several settings to supplement health authority recommendations to prevent and manage COVID-19. A digital Italian health tech startup, Paginemediche, developed a noncommercial, online DSS with a chat user interface to assist individuals in Italy manage their potential exposure to COVID-19 and interpret their symptoms since early in the pandemic. OBJECTIVE This study aimed to compare the trend in online DSS sessions with that of COVID-19 cases reported by the national health surveillance system in Italy, from February 2020 to March 2021. METHODS We compared the number of sessions by users with a COVID-19-positive contact and users with COVID-19-compatible symptoms with the number of cases reported by the national surveillance system. To calculate the distance between the time series, we used the dynamic time warping algorithm. We applied Symbolic Aggregate approXimation (SAX) encoding to the time series in 1-week periods. We calculated the Hamming distance between the SAX strings. We shifted time series of online DSS sessions 1 week ahead. We measured the improvement in Hamming distance to verify the hypothesis that online DSS sessions anticipate the trends in cases reported to the official surveillance system. RESULTS We analyzed 75,557 sessions in the online DSS; 65,207 were sessions by symptomatic users, while 19,062 were by contacts of individuals with COVID-19. The highest number of online DSS sessions was recorded early in the pandemic. Second and third peaks were observed in October 2020 and March 2021, respectively, preceding the surge in notified COVID-19 cases by approximately 1 week. The distance between sessions by users with COVID-19 contacts and reported cases calculated by dynamic time warping was 61.23; the distance between sessions by symptomatic users was 93.72. The time series of users with a COVID-19 contact was more consistent with the trend in confirmed cases. With the 1-week shift, the Hamming distance between the time series of sessions by users with a COVID-19 contact and reported cases improved from 0.49 to 0.46. We repeated the analysis, restricting the time window to between July 2020 and December 2020. The corresponding Hamming distance was 0.16 before and improved to 0.08 after the time shift. CONCLUSIONS Temporal trends in the number of online COVID-19 DSS sessions may precede the trend in reported COVID-19 cases through traditional surveillance. The trends in sessions by users with a contact with COVID-19 may better predict reported cases of COVID-19 than sessions by symptomatic users. Data from online DSS may represent a useful supplement to traditional surveillance and support the identification of early warning signals in the COVID-19 pandemic.
Collapse
Affiliation(s)
- Alberto Eugenio Tozzi
- Multifactorial and Complex Diseases Research Area, Bambino Gesù Children's Hospital IRCCS, Rome, Italy
| | - Francesco Gesualdo
- Multifactorial and Complex Diseases Research Area, Bambino Gesù Children's Hospital IRCCS, Rome, Italy
| | | | | | | | | | - Ileana Croci
- Multifactorial and Complex Diseases Research Area, Bambino Gesù Children's Hospital IRCCS, Rome, Italy
| | - Caterina Rizzo
- Clinical Pathways and Epidemiology Unit, Bambino Gesù Children's Hospital IRCCS, Rome, Italy
| |
Collapse
|
11
|
Husnayain A, Chuang TW, Fuad A, Su ECY. High variability in model performance of Google relative search volumes in spatially clustered COVID-19 areas of the USA. Int J Infect Dis 2021; 109:269-278. [PMID: 34273513 PMCID: PMC8922685 DOI: 10.1016/j.ijid.2021.07.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Revised: 06/22/2021] [Accepted: 07/11/2021] [Indexed: 12/24/2022] Open
Abstract
Objective: Incorporating spatial analyses and online health information queries may be beneficial in understanding the role of Google relative search volume (RSV) data as a secondary public health surveillance tool during pandemics. This study identified coronavirus disease 2019 (COVID-19) clustering and defined the predictability performance of Google RSV models in clustered and non-clustered areas of the USA. Methods: Getis-Ord General and local G statistics were used to identify monthly clustering patterns. Monthly country- and state-level correlations between new daily COVID-19 cases and Google RSVs were assessed using Spearman's rank correlation coefficients and Poisson regression models for January–December 2020. Results: Huge clusters involving multiple states were found, which resulted from various control measures in each state. This demonstrates the importance of state-to-state coordination in implementing control measures to tackle the spread of outbreaks. Variability in Google RSV model performance was found among states and time periods, possibly suggesting the need to use different frameworks for Google RSV data in each state. Moreover, the sign of correlation can be utilized to understand public responses to control and preventive measures, as well as in communicating risk. Conclusion: COVID-19 Google RSV model accuracy in the USA may be influenced by COVID-19 transmission dynamics, policy-driven community awareness and past outbreak experiences.
Collapse
Affiliation(s)
- Atina Husnayain
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan; Department of Biostatistics, Epidemiology and Population Health, Faculty of Medicine, Public Health and Nursing, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - Ting-Wu Chuang
- Department of Molecular Parasitology and Tropical Diseases, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Anis Fuad
- Department of Biostatistics, Epidemiology and Population Health, Faculty of Medicine, Public Health and Nursing, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - Emily Chia-Yu Su
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan; Clinical Big Data Research Centre, Taipei Medical University Hospital, Taipei, Taiwan.
| |
Collapse
|
12
|
Jang B, Kim I, Kim JW. Effective Training Data Extraction Method to Improve Influenza Outbreak Prediction from Online News Articles: Deep Learning Model Study. JMIR Med Inform 2021; 9:e23305. [PMID: 34032577 PMCID: PMC8188311 DOI: 10.2196/23305] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 10/13/2020] [Accepted: 04/01/2021] [Indexed: 11/13/2022] Open
Abstract
Background Each year, influenza affects 3 to 5 million people and causes 290,000 to 650,000 fatalities worldwide. To reduce the fatalities caused by influenza, several countries have established influenza surveillance systems to collect early warning data. However, proper and timely warnings are hindered by a 1- to 2-week delay between the actual disease outbreaks and the publication of surveillance data. To address the issue, novel methods for influenza surveillance and prediction using real-time internet data (such as search queries, microblogging, and news) have been proposed. Some of the currently popular approaches extract online data and use machine learning to predict influenza occurrences in a classification mode. However, many of these methods extract training data subjectively, and it is difficult to capture the latent characteristics of the data correctly. There is a critical need to devise new approaches that focus on extracting training data by reflecting the latent characteristics of the data. Objective In this paper, we propose an effective method to extract training data in a manner that reflects the hidden features and improves the performance by filtering and selecting only the keywords related to influenza before the prediction. Methods Although word embedding provides a distributed representation of words by encoding the hidden relationships between various tokens, we enhanced the word embeddings by selecting keywords related to the influenza outbreak and sorting the extracted keywords using the Pearson correlation coefficient in order to solely keep the tokens with high correlation with the actual influenza outbreak. The keyword extraction process was followed by a predictive model based on long short-term memory that predicts the influenza outbreak. To assess the performance of the proposed predictive model, we used and compared a variety of word embedding techniques. Results Word embedding without our proposed sorting process showed 0.8705 prediction accuracy when 50.2 keywords were selected on average. Conversely, word embedding using our proposed sorting process showed 0.8868 prediction accuracy and an improvement in prediction accuracy of 12.6%, although smaller amounts of training data were selected, with only 20.6 keywords on average. Conclusions The sorting stage empowers the embedding process, which improves the feature extraction process because it acts as a knowledge base for the prediction component. The model outperformed other current approaches that use flat extraction before prediction.
Collapse
Affiliation(s)
- Beakcheol Jang
- Graduate School of Information, Yonsei University, Seoul, Republic of Korea
| | - Inhwan Kim
- Graduate School of Information, Yonsei University, Seoul, Republic of Korea
| | - Jong Wook Kim
- Department of Computer Science, Sangmyung Univerisity, Seoul, Republic of Korea
| |
Collapse
|
13
|
Trends of Online Search of COVID-19 Related Terms in Cyprus. EPIDEMIOLGIA (BASEL, SWITZERLAND) 2021; 2:36-45. [PMID: 36417188 PMCID: PMC9620905 DOI: 10.3390/epidemiologia2010004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 01/04/2021] [Accepted: 01/15/2021] [Indexed: 12/14/2022]
Abstract
Knowledge of trends in web searches provides useful information for various purposes, including responses to public health emergencies. This work aims to analyze the popularity of internet search queries for Coronavirus Disease 2019 (COVID-19) and COVID-19 symptoms in Cyprus. Query data for the term Coronavirus were retrieved from Google Trends website between 19 January and 30 June 2020. The study focused on Cyprus and the four most populated cities: Nicosia, Limassol, Larnaca, and Paphos. COVID-19 symptoms including fever, cough, sore throat, shortness of breath, and myalgia were considered in the analysis. Daily and weekly search volumes were described, and their correlation with the evolution of the COVID-19 pandemic and important announcements or events were examined. Three periods of interest peaks were identified in Cyprus. The highest interest in COVID-19-related terms was found in the city of Paphos. The most popular symptoms were fever and cough, and the symptom with the highest increase in popularity was myalgia. At the beginning of the pandemic, the search volume of COVID-19 grew substantially when governments, major organizations, and high-profile figures, globally and locally, made important announcements regarding COVID-19. Health authorities in Cyprus and elsewhere could benefit from constantly monitoring the online interest of the population in order to get timely information that could be used in public health planning and response.
Collapse
|