1
|
Panja M, Chakraborty T, Kumar U, Liu N. Epicasting: An Ensemble Wavelet Neural Network for forecasting epidemics. Neural Netw 2023; 165:185-212. [PMID: 37307664 DOI: 10.1016/j.neunet.2023.05.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 03/11/2023] [Accepted: 05/27/2023] [Indexed: 06/14/2023]
Abstract
Infectious diseases remain among the top contributors to human illness and death worldwide, among which many diseases produce epidemic waves of infection. The lack of specific drugs and ready-to-use vaccines to prevent most of these epidemics worsens the situation. These force public health officials and policymakers to rely on early warning systems generated by accurate and reliable epidemic forecasters. Accurate forecasts of epidemics can assist stakeholders in tailoring countermeasures, such as vaccination campaigns, staff scheduling, and resource allocation, to the situation at hand, which could translate to reductions in the impact of a disease. Unfortunately, most of these past epidemics exhibit nonlinear and non-stationary characteristics due to their spreading fluctuations based on seasonal-dependent variability and the nature of these epidemics. We analyze various epidemic time series datasets using a maximal overlap discrete wavelet transform (MODWT) based autoregressive neural network and call it Ensemble Wavelet Neural Network (EWNet) model. MODWT techniques effectively characterize non-stationary behavior and seasonal dependencies in the epidemic time series and improve the nonlinear forecasting scheme of the autoregressive neural network in the proposed ensemble wavelet network framework. From a nonlinear time series viewpoint, we explore the asymptotic stationarity of the proposed EWNet model to show the asymptotic behavior of the associated Markov Chain. We also theoretically investigate the effect of learning stability and the choice of hidden neurons in the proposal. From a practical perspective, we compare our proposed EWNet framework with twenty-two statistical, machine learning, and deep learning models for fifteen real-world epidemic datasets with three test horizons using four key performance indicators. Experimental results show that the proposed EWNet is highly competitive compared to the state-of-the-art epidemic forecasting methods.
Collapse
Affiliation(s)
- Madhurima Panja
- Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore, India
| | - Tanujit Chakraborty
- Department of Science and Engineering, Sorbonne University Abu Dhabi, United Arab Emirates; Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore, India; School of Business, Woxsen University, Telengana, India.
| | - Uttam Kumar
- Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore, India
| | - Nan Liu
- Duke-NUS Medical School, National University of Singapore, Singapore
| |
Collapse
|
2
|
Keshavamurthy R, Charles LE. Predicting Kyasanur forest disease in resource-limited settings using event-based surveillance and transfer learning. Sci Rep 2023; 13:11067. [PMID: 37422454 PMCID: PMC10329696 DOI: 10.1038/s41598-023-38074-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 07/02/2023] [Indexed: 07/10/2023] Open
Abstract
In recent years, the reports of Kyasanur forest disease (KFD) breaking endemic barriers by spreading to new regions and crossing state boundaries is alarming. Effective disease surveillance and reporting systems are lacking for this emerging zoonosis, hence hindering control and prevention efforts. We compared time-series models using weather data with and without Event-Based Surveillance (EBS) information, i.e., news media reports and internet search trends, to predict monthly KFD cases in humans. We fitted Extreme Gradient Boosting (XGB) and Long Short Term Memory models at the national and regional levels. We utilized the rich epidemiological data from endemic regions by applying Transfer Learning (TL) techniques to predict KFD cases in new outbreak regions where disease surveillance information was scarce. Overall, the inclusion of EBS data, in addition to the weather data, substantially increased the prediction performance across all models. The XGB method produced the best predictions at the national and regional levels. The TL techniques outperformed baseline models in predicting KFD in new outbreak regions. Novel sources of data and advanced machine-learning approaches, e.g., EBS and TL, show great potential towards increasing disease prediction capabilities in data-scarce scenarios and/or resource-limited settings, for better-informed decisions in the face of emerging zoonotic threats.
Collapse
Affiliation(s)
- Ravikiran Keshavamurthy
- Pacific Northwest National Laboratory, Richland, WA, 99354, USA
- Paul G. Allen School for Global Health, Washington State University, Pullman, WA, 99164, USA
| | - Lauren E Charles
- Pacific Northwest National Laboratory, Richland, WA, 99354, USA.
- Paul G. Allen School for Global Health, Washington State University, Pullman, WA, 99164, USA.
| |
Collapse
|
3
|
Stolerman LM, Clemente L, Poirier C, Parag KV, Majumder A, Masyn S, Resch B, Santillana M. Using digital traces to build prospective and real-time county-level early warning systems to anticipate COVID-19 outbreaks in the United States. SCIENCE ADVANCES 2023; 9:eabq0199. [PMID: 36652520 PMCID: PMC9848273 DOI: 10.1126/sciadv.abq0199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 12/19/2022] [Indexed: 06/17/2023]
Abstract
Coronavirus disease 2019 (COVID-19) continues to affect the world, and the design of strategies to curb disease outbreaks requires close monitoring of their trajectories. We present machine learning methods that leverage internet-based digital traces to anticipate sharp increases in COVID-19 activity in U.S. counties. In a complementary direction to the efforts led by the Centers for Disease Control and Prevention (CDC), our models are designed to detect the time when an uptrend in COVID-19 activity will occur. Motivated by the need for finer spatial resolution epidemiological insights, we build upon previous efforts conceived at the state level. Our methods-tested in an out-of-sample manner, as events were unfolding, in 97 counties representative of multiple population sizes across the United States-frequently anticipated increases in COVID-19 activity 1 to 6 weeks before local outbreaks, defined when the effective reproduction number Rt becomes larger than 1 for a period of 2 weeks.
Collapse
Affiliation(s)
- Lucas M. Stolerman
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Department of Mathematics, Oklahoma State University, Stillwater, OK, USA
| | - Leonardo Clemente
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA
| | - Canelle Poirier
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Kris V. Parag
- NIHR Health Protection Research Unit, Behavioural Science and Evaluation, University of Bristol, Bristol, UK
| | | | - Serge Masyn
- Global Public Health, Janssen R&D, Beerse, Belgium
| | - Bernd Resch
- Department of Geoinformatics - Z-GIS, University of Salzburg, Salzburg, Austria
- Center for Geographic Analysis, Harvard University, Cambridge, MA, USA
| | - Mauricio Santillana
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA
- Harvard University, T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
4
|
Ma MZ. Heightened religiosity proactively and reactively responds to the COVID-19 pandemic across the globe: Novel insights from the parasite-stress theory of sociality and the behavioral immune system theory. INTERNATIONAL JOURNAL OF INTERCULTURAL RELATIONS : IJIR 2022; 90:38-56. [PMID: 35855693 PMCID: PMC9276875 DOI: 10.1016/j.ijintrel.2022.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 07/05/2022] [Accepted: 07/10/2022] [Indexed: 06/15/2023]
Abstract
According to the parasite-stress theory of sociality and the behavioral immune system theory, heightened religiosity serves an anti-pathogen function by promoting in-group assortative sociality. Thus, highly religious countries/territories could have better control of the COVID-19 (proactively avoids disease-threat), and heightened COVID-19 threat could increase religiosity (reactively responds to disease-threat). As expected, country-level religiosity (religion-related online searches (Allah, Buddhism, Jesus, etc.) and number of total religions/ethnoreligions) negatively and significantly predicted COVID-19 severity (a composite index of COVID-19 susceptibility, reproductive rate, morbidity, and mortality rates) (Study 1a), after accounting for covariates (e.g., socioeconomic factors, ecological factors, collectivism index, cultural tightness-looseness index, COVID-19 policy response, test-to-case ratio). Moreover, multilevel analysis accounting for daily- (e.g., time-trend effect, season) and macro-level (same as in Study 1a) covariates showed that country-level religious searches, compared with the number of total religions/ethnoreligions, were more robust in negatively and significantly predicting daily-level COVID-19 severity during early pandemic stages (Study 1b). At weekly level, perceived coronavirus threat measured with coronavirus-related searches (corona, covid, covid-19, etc.), compared with actual COVID-19 threat measured with epidemiological data, showed larger effects in positively predicting religious searches (Study 2), after accounting for weekly- (e.g., autocorrelation, time-trend effect, season, religious holidays, major-illness-related searches) and macro-level (e.g., Christian-majority country/territory and all country-level variables in Study 1) covariates. Accordingly, heightened religiosity could proactively and reactively respond to the COVID-19 pandemic across the globe.
Collapse
Affiliation(s)
- Mac Zewei Ma
- Department of Social and Behavioural Sciences, City University of Hong Kong, Hong Kong
| |
Collapse
|
5
|
Sparse representations of high dimensional neural data. Sci Rep 2022; 12:7295. [PMID: 35508638 PMCID: PMC9068763 DOI: 10.1038/s41598-022-10459-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 04/01/2022] [Indexed: 11/08/2022] Open
Abstract
Conventional Vector Autoregressive (VAR) modelling methods applied to high dimensional neural time series data result in noisy solutions that are dense or have a large number of spurious coefficients. This reduces the speed and accuracy of auxiliary computations downstream and inflates the time required to compute functional connectivity networks by a factor that is at least inversely proportional to the true network density. As these noisy solutions have distorted coefficients, thresholding them as per some criterion, statistical or otherwise, does not alleviate the problem. Thus obtaining a sparse representation of such data is important since it provides an efficient representation of the data and facilitates its further analysis. We propose a fast Sparse Vector Autoregressive Greedy Search (SVARGS) method that works well for high dimensional data, even when the number of time points is relatively low, by incorporating only statistically significant coefficients. In numerical experiments, our methods show high accuracy in recovering the true sparse model. The relative absence of spurious coefficients permits accurate, stable and fast evaluation of derived quantities such as power spectrum, coherence and Granger causality. Consequently, sparse functional connectivity networks can be computed, in a reasonable time, from data comprising tens of thousands of channels/voxels. This enables a much higher resolution analysis of functional connectivity patterns and community structures in such large networks than is possible using existing time series methods. We apply our method to EEG data where computed network measures and community structures are used to distinguish emotional states as well as to ADHD fMRI data where it is used to distinguish children with ADHD from typically developing children.
Collapse
|
6
|
Nadda W, Boonchieng W, Boonchieng E. Influenza, dengue and common cold detection using LSTM with fully connected neural network and keywords selection. BioData Min 2022; 15:5. [PMID: 35164818 PMCID: PMC8842807 DOI: 10.1186/s13040-022-00288-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Accepted: 01/23/2022] [Indexed: 11/29/2022] Open
Abstract
Symptom-based machine learning models for disease detection are a way to reduce the workload of doctors when they have too many patients. Currently, there are many research studies on machine learning or deep learning for disease detection or clinical departments classification, using text of patient’s symptoms and vital signs. In this study, we used the Long Short-term Memory (LSTM) with a fully connected neural network model for classification, where the LSTM model was used to receive the patient’s symptoms text as input data. The fully connected neural network was used to receive other input data from the patients, including body temperature, age, gender, and the month the patients received care in. In this research, a data preprocessing algorithm was improved by using keyword selection to reduce the complexity of input data for overfitting problem prevention. The results showed that the LSTM with fully connected neural network model performed better than the LSTM model. The keyword selection method also increases model performance.
Collapse
Affiliation(s)
- Wanchaloem Nadda
- Department of Computer Science, Faculty of Science, Chang Mai University, Chiang Mai, 50200, Thailand
| | - Waraporn Boonchieng
- Faculty of Public Health, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Ekkarat Boonchieng
- Center of Excellence in Community Health Informatics, Department of Computer Science, Faculty of Science, Chiang Mai University, Chiang Mai, 50200, Thailand.
| |
Collapse
|
7
|
Data-driven methods for dengue prediction and surveillance using real-world and Big Data: A systematic review. PLoS Negl Trop Dis 2022; 16:e0010056. [PMID: 34995281 PMCID: PMC8740963 DOI: 10.1371/journal.pntd.0010056] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 12/06/2021] [Indexed: 12/23/2022] Open
Abstract
Background Traditionally, dengue surveillance is based on case reporting to a central health agency. However, the delay between a case and its notification can limit the system responsiveness. Machine learning methods have been developed to reduce the reporting delays and to predict outbreaks, based on non-traditional and non-clinical data sources. The aim of this systematic review was to identify studies that used real-world data, Big Data and/or machine learning methods to monitor and predict dengue-related outcomes. Methodology/Principal findings We performed a search in PubMed, Scopus, Web of Science and grey literature between January 1, 2000 and August 31, 2020. The review (ID: CRD42020172472) focused on data-driven studies. Reviews, randomized control trials and descriptive studies were not included. Among the 119 studies included, 67% were published between 2016 and 2020, and 39% used at least one novel data stream. The aim of the included studies was to predict a dengue-related outcome (55%), assess the validity of data sources for dengue surveillance (23%), or both (22%). Most studies (60%) used a machine learning approach. Studies on dengue prediction compared different prediction models, or identified significant predictors among several covariates in a model. The most significant predictors were rainfall (43%), temperature (41%), and humidity (25%). The two models with the highest performances were Neural Networks and Decision Trees (52%), followed by Support Vector Machine (17%). We cannot rule out a selection bias in our study because of our two main limitations: we did not include preprints and could not obtain the opinion of other international experts. Conclusions/Significance Combining real-world data and Big Data with machine learning methods is a promising approach to improve dengue prediction and monitoring. Future studies should focus on how to better integrate all available data sources and methods to improve the response and dengue management by stakeholders. Dengue is one of the most important arbovirus infections in the world and its public health, societal and economic burden is increasing. Although the majority of dengue cases are asymptomatic or mild, severe disease forms can lead to death. For this reason, early diagnosis and monitoring of dengue are crucial to decrease mortality. However, most endemic regions still rely on traditional monitoring methods, despite the growing availability of novel data sources and data-driven methods based on real-world data, Big Data, and machine learning algorithms. In this systematic review, we identified and analyzed studies that used these novel approaches for dengue monitoring and/or prediction. We found that novel data streams, such as Internet search engines and social media platforms, and machine learning methods can be successfully used to improve dengue management, but are still vastly ignored in real life. These approaches should be combined with traditional methods to help stakeholders better prepare for each outbreak and improve early responsiveness.
Collapse
|
8
|
Poirier C, Hswen Y, Bouzillé G, Cuggia M, Lavenu A, Brownstein JS, Brewer T, Santillana M. Influenza forecasting for French regions combining EHR, web and climatic data sources with a machine learning ensemble approach. PLoS One 2021; 16:e0250890. [PMID: 34010293 PMCID: PMC8133501 DOI: 10.1371/journal.pone.0250890] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 04/16/2021] [Indexed: 11/25/2022] Open
Abstract
Effective and timely disease surveillance systems have the potential to help public health officials design interventions to mitigate the effects of disease outbreaks. Currently, healthcare-based disease monitoring systems in France offer influenza activity information that lags real-time by one to three weeks. This temporal data gap introduces uncertainty that prevents public health officials from having a timely perspective on the population-level disease activity. Here, we present a machine-learning modeling approach that produces real-time estimates and short-term forecasts of influenza activity for the twelve continental regions of France by leveraging multiple disparate data sources that include, Google search activity, real-time and local weather information, flu-related Twitter micro-blogs, electronic health records data, and historical disease activity synchronicities across regions. Our results show that all data sources contribute to improving influenza surveillance and that machine-learning ensembles that combine all data sources lead to accurate and timely predictions.
Collapse
Affiliation(s)
- Canelle Poirier
- INSERM, U1099, Rennes, France
- Université de Rennes 1, LTSI, Rennes, France
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States of America
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, United States of America
- * E-mail: (CP); (MS)
| | - Yulin Hswen
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA, United States of America
- Innovation Program, Boston Children’s Hospital, Boston, MA, United States of America
| | - Guillaume Bouzillé
- INSERM, U1099, Rennes, France
- Université de Rennes 1, LTSI, Rennes, France
- CHU Rennes, Centre de Données Cliniques, Rennes, France
| | - Marc Cuggia
- INSERM, U1099, Rennes, France
- Université de Rennes 1, LTSI, Rennes, France
- CHU Rennes, Centre de Données Cliniques, Rennes, France
| | - Audrey Lavenu
- Université de Rennes 1, Faculté de médecine, Rennes, France
- INSERM CIC 1414, Université de Rennes 1, Rennes, France
- IRMAR, Institut de Recherche Mathématique de Rennes, Rennes, France
| | - John S. Brownstein
- Innovation Program, Boston Children’s Hospital, Boston, MA, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States of America
| | - Thomas Brewer
- Innovation Program, Boston Children’s Hospital, Boston, MA, United States of America
| | - Mauricio Santillana
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States of America
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, United States of America
- * E-mail: (CP); (MS)
| |
Collapse
|
9
|
Trends of Online Search of COVID-19 Related Terms in Cyprus. EPIDEMIOLGIA (BASEL, SWITZERLAND) 2021; 2:36-45. [PMID: 36417188 PMCID: PMC9620905 DOI: 10.3390/epidemiologia2010004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 01/04/2021] [Accepted: 01/15/2021] [Indexed: 12/14/2022]
Abstract
Knowledge of trends in web searches provides useful information for various purposes, including responses to public health emergencies. This work aims to analyze the popularity of internet search queries for Coronavirus Disease 2019 (COVID-19) and COVID-19 symptoms in Cyprus. Query data for the term Coronavirus were retrieved from Google Trends website between 19 January and 30 June 2020. The study focused on Cyprus and the four most populated cities: Nicosia, Limassol, Larnaca, and Paphos. COVID-19 symptoms including fever, cough, sore throat, shortness of breath, and myalgia were considered in the analysis. Daily and weekly search volumes were described, and their correlation with the evolution of the COVID-19 pandemic and important announcements or events were examined. Three periods of interest peaks were identified in Cyprus. The highest interest in COVID-19-related terms was found in the city of Paphos. The most popular symptoms were fever and cough, and the symptom with the highest increase in popularity was myalgia. At the beginning of the pandemic, the search volume of COVID-19 grew substantially when governments, major organizations, and high-profile figures, globally and locally, made important announcements regarding COVID-19. Health authorities in Cyprus and elsewhere could benefit from constantly monitoring the online interest of the population in order to get timely information that could be used in public health planning and response.
Collapse
|
10
|
A call for an ethical framework when using social media data for artificial intelligence applications in public health research. ACTA ACUST UNITED AC 2020; 46:169-173. [PMID: 32673381 DOI: 10.14745/ccdr.v46i06a03] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Advancements in artificial intelligence (AI), more precisely the subfield of machine learning, and their applications to open-source internet data, such as social media, are growing faster than the management of ethical issues for use in society. An ethical framework helps scientists and policy makers consider ethics in their fields of practice, legitimize their work and protect members of the data-generating public. A central question for advancing the ethical framework is whether or not Tweets, Facebook posts and other open-source social media data generated by the public represent a human or not. The objective of this paper is to highlight ethical issues that the public health sector will be or is already confronting when using social media data in practice. The issues include informed consent, privacy, anonymization and balancing these issues with the benefits of using social media data for the common good. Current ethical frameworks need to provide guidance for addressing issues arising from the use of social media data in the public health sector. Discussions in this area should occur while the application of open-source data is still relatively new, and they should also keep pace as other problems arise from ongoing technological change.
Collapse
|