1
|
Rahman MS, Amrin M, Bokkor Shiddik MA. Dengue Early Warning System and Outbreak Prediction Tool in Bangladesh Using Interpretable Tree-Based Machine Learning Model. Health Sci Rep 2025; 8:e70726. [PMID: 40352319 PMCID: PMC12063067 DOI: 10.1002/hsr2.70726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 03/18/2025] [Accepted: 04/04/2025] [Indexed: 05/14/2025] Open
Abstract
Background and Aims A life-threatening vector-borne disease, dengue fever (DF), poses significant global public health and economic threats, including Bangladesh. Determining dengue risk factors is crucial for early warning systems to forecast disease epidemics and develop efficient control strategies. To address this, we propose an interpretable tree-based machine learning (ML) model for dengue early warning systems and outbreak prediction in Bangladesh based on climatic, sociodemographic, and landscape factors. Methods A framework for forecasting DF risk was developed by using high-performance ML algorithms, namely Random Forests, eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), based on sociodemographic, climate, landscape, and dengue surveillance epidemiological data (January 2000 to December 2021). The optimal tree-based ML model with strong interpretability was created by comparing various ML models using the hyperparameter optimization technique. The feature importance ranking and the most significant dengue driver were found using the SHapley Additive explanation (SHAP) value. Results Our study findings detected a nonlinear effect of climatic parameters on dengue at different thresholds such as mean (27°C), minimum (22°C), maximum temperatures (32°C), and relative humidity (82%). The optimal minimum and maximum temperatures, humidity, rainfall, and wind speed for dengue risk are 25-28°C, 32-34°C, 75%-85%, 10 mm, and 12 m/s, respectively. The LightGBM model accurately forecasts DF and agricultural land, population density, and minimum temperature significantly affecting the dengue outbreak in Bangladesh. Conclusion Our proposed ML model functions as an early warning system, improving comprehension of the factors that precipitate dengue outbreaks and providing a framework for sophisticated analytical techniques in public health.
Collapse
|
2
|
Zheng HL, An SY, Qiao BJ, Guan P, Huang DS, Wu W. A data-driven interpretable ensemble framework based on tree models for forecasting the occurrence of COVID-19 in the USA. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:13648-13659. [PMID: 36131178 PMCID: PMC9492466 DOI: 10.1007/s11356-022-23132-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 09/16/2022] [Indexed: 06/15/2023]
Abstract
This prevalence of coronavirus disease 2019 (COVID-19) has become one of the most serious public health crises. Tree-based machine learning methods, with the advantages of high efficiency, and strong interpretability, have been widely used in predicting diseases. A data-driven interpretable ensemble framework based on tree models was designed to forecast daily new cases of COVID-19 in the USA and to determine the important factors related to COVID-19. Based on a hyperparametric optimization technique, we developed three machine learning algorithms based on decision trees, including random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), and three linear ensemble models were used to integrate these outcomes for better prediction accuracy. Finally, the SHapley Additive explanation (SHAP) value was used to obtain the feature importance ranking. Our outcomes demonstrated that, among the three basic machine learners, the prediction accuracy was the following in descending order: LightGBM, XGBoost, and RF. The optimized LAD ensemble was the most precise prediction model that reduced the prediction error of the best base learner (LightGBM) by approximately 3.111%, while vaccination, wearing masks, less mobility, and government interventions had positive effects on the control and prevention of COVID-19.
Collapse
Affiliation(s)
- Hu-Li Zheng
- Department of Epidemiology, School of Public Health, China Medical University, No. 77 Puhe Road, Shenyang, Liaoning Province China
| | - Shu-Yi An
- Liaoning Provincial Center for Disease Control and Prevention, Shenyang, Liaoning China
| | - Bao-Jun Qiao
- Liaoning Provincial Center for Disease Control and Prevention, Shenyang, Liaoning China
| | - Peng Guan
- Department of Epidemiology, School of Public Health, China Medical University, No. 77 Puhe Road, Shenyang, Liaoning Province China
| | - De-Sheng Huang
- Department of Mathematics, School of Intelligent Medicine, China Medical University, Shenyang, Liaoning China
| | - Wei Wu
- Department of Epidemiology, School of Public Health, China Medical University, No. 77 Puhe Road, Shenyang, Liaoning Province China
| |
Collapse
|
3
|
Palermo MB, Policarpo LM, Costa CAD, Righi RDR. Tracking machine learning models for pandemic scenarios: a systematic review of machine learning models that predict local and global evolution of pandemics. NETWORK MODELING AND ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS 2022; 11:40. [PMID: 36249862 PMCID: PMC9553296 DOI: 10.1007/s13721-022-00384-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/09/2022] [Accepted: 09/20/2022] [Indexed: 11/26/2022]
Abstract
This systematic review aims to study and classify machine learning models that predict pandemics' evolution within affected regions or countries. The advantage of this systematic review is that it allows the health authorities to decide what prediction model fits best depending upon the region's criticality and optimize hospitals' approaches to preparing and anticipating patient care. We searched ACM Digital Library, Biomed Central, BioRxiv+MedRxiv, BMJ, Computers and Applied Sciences, IEEEXplore, JMIR Medical Informatics, Medline Daily Updates, Nature, Oxford Academic, PubMed, Sage Online, ScienceDirect, Scopus, SpringerLink, Web of Science, and Wiley Online Library between 1 January 2020 and 31 July 2022. We divided the interventions into similarities between cumulative COVID-19 real cases and machine learning prediction models' ability to track pandemics trending. We included 45 studies that rated low to high risk of bias. The standardized mean differences (SMD) for the two groups were 0.18, 95% CI, with interval of [0.01, 0.35], I 2 =0, and p value=0.04. We built a taxonomic analysis of the included studies and determined two domains: pandemics trending prediction models and geolocation tracking models. We performed the meta-analysis and data synthesis and got low publication bias because of missing results. The level of certainty varied from very low to high. By submitting the 45 studies on the risk of bias, the levels of certainty, the summary of findings, and the statistical analysis via the forest and funnel plots assessments, we could determine the satisfactory statistical significance homogeneity across the included studies to simulate the progress of the pandemics and help the healthcare authorities to take preventive decisions.
Collapse
Affiliation(s)
- Marcelo Benedeti Palermo
- Software Innovation Laboratory-SOFTWARELAB, Programa de Pós-Graduação em Computação Aplicada, Universidade do Vale do Rio dos Sinos, Av. Unisinos 950, São Leopoldo, RS 93022-750 Brazil
| | - Lucas Micol Policarpo
- Software Innovation Laboratory-SOFTWARELAB, Programa de Pós-Graduação em Computação Aplicada, Universidade do Vale do Rio dos Sinos, Av. Unisinos 950, São Leopoldo, RS 93022-750 Brazil
| | - Cristiano André da Costa
- Software Innovation Laboratory-SOFTWARELAB, Programa de Pós-Graduação em Computação Aplicada, Universidade do Vale do Rio dos Sinos, Av. Unisinos 950, São Leopoldo, RS 93022-750 Brazil
| | - Rodrigo da Rosa Righi
- Software Innovation Laboratory-SOFTWARELAB, Programa de Pós-Graduação em Computação Aplicada, Universidade do Vale do Rio dos Sinos, Av. Unisinos 950, São Leopoldo, RS 93022-750 Brazil
| |
Collapse
|
4
|
Moosazadeh M, Ifaei P, Tayerani Charmchi AS, Asadi S, Yoo C. A machine learning-driven spatio-temporal vulnerability appraisal based on socio-economic data for COVID-19 impact prevention in the U.S. counties. SUSTAINABLE CITIES AND SOCIETY 2022; 83:103990. [PMID: 35692599 PMCID: PMC9167466 DOI: 10.1016/j.scs.2022.103990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 06/04/2022] [Accepted: 06/04/2022] [Indexed: 05/02/2023]
Abstract
A mature and hybrid machine-learning model is verified by mature empirical analysis to measure county-level COVID-19 vulnerability and track the impact of the imposition of pandemic control policies in the U.S. A total of 30 county-level social, economic, and medical variables and a timeline of the imposed policies constitutes a COVID-19 database. A hybrid feature-selection model composed of four machine-learning algorithms is developed to emphasize the regional impact of community features on the case fatality rate (CFR). A COVID-19 vulnerability index (COVULin) is proposed to measure the county's vulnerability, the effects of model's parameters on mortality, and the efficiency of control policies. The results showed that the dense counties in which minority groups represent more than 45% of the population and those with poverty rates greater than 24% were the most vulnerable counties during the first and the last pandemic peaks, respectively. Highly-correlated CFR and COVULin scores indicated a close agreement between the model outcomes and COVID-19 impacts. Counties with higher poverty and uninsured rates were the most resistant to government intervention. It is anticipated that the proposed model can play an essential role in identifying vulnerable communities and help reduce damages during long-term alike disasters.
Collapse
Affiliation(s)
- Mohammad Moosazadeh
- Department of Environmental Science and Engineering, Center for Environmental Studies, College of Engineering, Kyung Hee University, Seocheon-dong 1, Giheung-gu, Yongin-Si, Gyeonggi-Do 446-701, South Korea
| | - Pouya Ifaei
- Department of Environmental Science and Engineering, Center for Environmental Studies, College of Engineering, Kyung Hee University, Seocheon-dong 1, Giheung-gu, Yongin-Si, Gyeonggi-Do 446-701, South Korea
| | - Amir Saman Tayerani Charmchi
- Department of Environmental Science and Engineering, Center for Environmental Studies, College of Engineering, Kyung Hee University, Seocheon-dong 1, Giheung-gu, Yongin-Si, Gyeonggi-Do 446-701, South Korea
| | - Somayeh Asadi
- Department of Architectural Engineering, Pennsylvania State University, 213 Engineering Unit, University Park, PA 16802, United States
| | - ChangKyoo Yoo
- Department of Environmental Science and Engineering, Center for Environmental Studies, College of Engineering, Kyung Hee University, Seocheon-dong 1, Giheung-gu, Yongin-Si, Gyeonggi-Do 446-701, South Korea
| |
Collapse
|
5
|
Bouzouina L, Kourtit K, Nijkamp P. Impact of immobility and mobility activities on the spread of COVID‐19: Evidence from European countries. REGIONAL SCIENCE POLICY & PRACTICE 2022; 14:10.1111/rsp3.12565. [PMCID: PMC9349732 DOI: 10.1111/rsp3.12565] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 05/20/2022] [Accepted: 07/01/2022] [Indexed: 06/19/2023]
Abstract
To limit the spread of COVID‐19, most countries in the world have put in place measures which restrict mobility. The co‐presence of several people in the same place of work, shopping, leisure or transport is considered a favourable vector for the transmission of the virus. However, this hypothesis remains to be verified in the light of the daily data available since the first wave of contamination. Does immobility reduce the spread of the COVID‐19 pandemic? Does mobility contribute to the increase in the number of infections for all activities? This paper applies several pooled mean group–autoregressive distributed lag (PMG–ARDL) models to investigate the impact of immobility and daily mobility activities on the spread of the COVID‐19 pandemic in European countries using daily data for the period from 12 March 2020 to 31 August 2021. The results of the PMG–ARDL models show that immobility and higher temperatures play a significant role in reducing the COVID‐19 pandemic. The increase in mobility activities (grocery, retail, use of transit) is also positively associated with the number of new COVID‐19 cases. The combined analysis with the Granger causality test shows that the relationship between mobility and COVID‐19 goes in both directions, with the exception of grocery shopping, visits to parks and commuting mobility. The former favours the spread of COVID‐19, while the next two have no causal relationship with COVID‐19. The results confirm the role of immobility in mitigating the spread of the pandemic, but call into question the drastic policies of systematically closing all places of activity.
Collapse
Affiliation(s)
- Louafi Bouzouina
- LAET, ENTPEUniversity of LyonFrance
- Open UniversityHeerlenThe Netherlands
| | | | | |
Collapse
|
6
|
Asif Z, Chen Z, Stranges S, Zhao X, Sadiq R, Olea-Popelka F, Peng C, Haghighat F, Yu T. Dynamics of SARS-CoV-2 spreading under the influence of environmental factors and strategies to tackle the pandemic: A systematic review. SUSTAINABLE CITIES AND SOCIETY 2022; 81:103840. [PMID: 35317188 PMCID: PMC8925199 DOI: 10.1016/j.scs.2022.103840] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 03/10/2022] [Accepted: 03/12/2022] [Indexed: 05/05/2023]
Abstract
COVID-19 is deemed as the most critical world health calamity of the 21st century, leading to dramatic life loss. There is a pressing need to understand the multi-stage dynamics, including transmission routes of the virus and environmental conditions due to the possibility of multiple waves of COVID-19 in the future. In this paper, a systematic examination of the literature is conducted associating the virus-laden-aerosol and transmission of these microparticles into the multimedia environment, including built environments. Particularly, this paper provides a critical review of state-of-the-art modelling tools apt for COVID-19 spread and transmission pathways. GIS-based, risk-based, and artificial intelligence-based tools are discussed for their application in the surveillance and forecasting of COVID-19. Primary environmental factors that act as simulators for the spread of the virus include meteorological variation, low air quality, pollen abundance, and spatial-temporal variation. However, the influence of these environmental factors on COVID-19 spread is still equivocal because of other non-pharmaceutical factors. The limitations of different modelling methods suggest the need for a multidisciplinary approach, including the 'One-Health' concept. Extended One-Health-based decision tools would assist policymakers in making informed decisions such as social gatherings, indoor environment improvement, and COVID-19 risk mitigation by adapting the control measurements.
Collapse
Affiliation(s)
- Zunaira Asif
- Department of Building, Civil and Environmental Engineering, Concordia University, Montreal, Canada
| | - Zhi Chen
- Department of Building, Civil and Environmental Engineering, Concordia University, Montreal, Canada
| | - Saverio Stranges
- Department of Epidemiology and Biostatistics, Western University, Ontario, Canada
- Department of Precision Health, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Xin Zhao
- Department of Animal Science, McGill University, Montreal, Canada
| | - Rehan Sadiq
- School of Engineering (Okanagan Campus), University of British Columbia, Kelowna, BC, Canada
| | | | - Changhui Peng
- Department of Biological Sciences, University of Quebec in Montreal, Canada
| | - Fariborz Haghighat
- Department of Building, Civil and Environmental Engineering, Concordia University, Montreal, Canada
| | - Tong Yu
- Department of Civil and Environmental Engineering, University of Alberta, Canada
| |
Collapse
|
7
|
Data-driven multiscale modelling and analysis of COVID-19 spatiotemporal evolution using explainable AI. SUSTAINABLE CITIES AND SOCIETY 2022; 80:103772. [PMID: 35186668 PMCID: PMC8832881 DOI: 10.1016/j.scs.2022.103772] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 01/27/2022] [Accepted: 02/10/2022] [Indexed: 05/21/2023]
Abstract
To quantificationally identify the optimal control measures for regulators to best minimize COVID-19′s growth (G-rate) and death (D-rate) rates in today's context, this paper develops a top-down multiscale engineering approach which encompasses a series of systematic analyses, namely: (global scale) predictive modelling of G-rate and D-rate due to COVID-19 globally, followed by determining the most effective control factors which can best minimize both parameters over time via explainable Artificial Intelligence (AI) with SHAP (SHapley Additive exPlanations) method; (continental scale) same predictive forecasting of G-rate and D-rate in all continents, followed by performing explainable SHAP analysis to determine the most effective control factors for the respective continents; and (country scale) clustering the different countries (> 150 in total) into 3 main clusters to identify the universal set of effective control measures. By using the historical period between 2 May 2020 and 1 Oct 2021, the average MAPE scores for forecasting G-rate and D-rate are within 10%, or less on average, at the global and continental scales. Systematically, we have quantificationally demonstrated that the top 3 most effective control measures for regulators to best minimize G-rate universally are COVID-CONTACT-TRACING, PUBLIC-GATHERING-RULES, and COVID-STRINGENCY-INDEX, while the control factors relating to D-rate depend on the modelling scenario.
Collapse
|
8
|
An Innovative Index for Evaluating Urban Vulnerability on Pandemic Using LambdaMART Algorithm. SUSTAINABILITY 2022. [DOI: 10.3390/su14095053] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The COVID-19 pandemic has significantly changed urban life and increased attention has been paid to the pandemic in discussions of urban vulnerability. There is a lack of methods to incorporate dynamic indicators such as urban vitality into evaluations of urban pandemic vulnerability. In this research, we use machine learning to establish an urban Pandemic Vulnerability Index (PVI) that measures the city’s vulnerability to the pandemic and takes dynamic indicators as an important aspect of this. The proposed PVI is constructed using 140 statistic variables and 10 dynamic variables, using data from 47 prefectures of Japan. Factor Analysis is used to extract factors from variables that may affect city vulnerability, and the LambdaMART algorithm is used to aggregate factors and predict vulnerability. The results show that the proposed PVI can predict the relative seriousness of the COVID-19 pandemic in two weeks with a precision of more than 0.71, which is meaningful for taking controlling measures in advance and shaping the society’s response. Further analysis revealed the key factors affecting urban pandemic vulnerability, including city size, transit station vitality, and medical facilities, emphasizing precautions for public transport systems and new planning concepts such as the compact city. This research explores the application of machine learning techniques in the indicator establishment and incorporates dynamic factors into vulnerability assessments, which contribute to improvements in urban vulnerability assessments and the planning of sustainable cities while facing the challenges of the COVID-19 pandemic.
Collapse
|
9
|
Guo Y, Zhang N, Hu T, Wang Z, Zhang Y. Optimization of energy efficiency and COVID-19 pandemic control in different indoor environments. ENERGY AND BUILDINGS 2022; 261:111954. [PMID: 35185270 PMCID: PMC8848536 DOI: 10.1016/j.enbuild.2022.111954] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 01/23/2022] [Accepted: 02/12/2022] [Indexed: 05/11/2023]
Abstract
The COVID-19 pandemic has led to considerable morbidity and mortality, and consumed enormous resources (e.g. energy) to control and prevent the disease. It is crucial to balance infection risk and energy consumption when reducing the spread of infection. In this study, a quantitative human, behavior-based, infection risk-energy consumption model for different indoor environments was developed. An optimal balance point for each indoor environment can be obtained using the anti-problem method. For this study we selected Wangjing Block, one of the most densely populated places in Beijing, as an example. Under the current ventilation standard (30 m3/h/person), prevention and control of the COVID-19 pandemic would be insufficient because the basic reproduction number (R0 ) for students, workers and elders are greater than 1. The optimal required fresh air ventilation rates in most indoor environments are near or below 60 m3/h/person, after considering the combined effects of multiple mitigation measures. In residences, sports buildings and restaurants, the demand for fresh air ventilation rate is relatively high. After our global optimization of infection risk control (R0 ≤ 1), energy consumption can be reduced by 13.7% and 45.1% on weekdays and weekends, respectively, in contrast to a strategy of strict control (R0 = 1 for each indoor environment).
Collapse
Affiliation(s)
- Yong Guo
- Department of Building Science, Tsinghua University, Beijing, China
- Beijing Key Laboratory of Indoor Air Quality Evaluation and Control, Beijing, China
| | - Nan Zhang
- Beijing Key Laboratory of Green Built Environment and Energy Efficient Technology, Beijing University of Technology, Beijing, China
| | - Tingrui Hu
- Beijing Key Laboratory of Green Built Environment and Energy Efficient Technology, Beijing University of Technology, Beijing, China
| | - Zhenyu Wang
- College of Economics and Management, Beijing University of Technology, Beijing, China
| | - Yinping Zhang
- Department of Building Science, Tsinghua University, Beijing, China
- Beijing Key Laboratory of Indoor Air Quality Evaluation and Control, Beijing, China
| |
Collapse
|
10
|
Pan Y, Zhang L, Unwin J, Skibniewski MJ. Discovering spatial-temporal patterns via complex networks in investigating COVID-19 pandemic in the United States. SUSTAINABLE CITIES AND SOCIETY 2022; 77:103508. [PMID: 34931157 PMCID: PMC8674122 DOI: 10.1016/j.scs.2021.103508] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 09/26/2021] [Accepted: 10/22/2021] [Indexed: 05/22/2023]
Abstract
A novel approach combining time series analysis and complex network theory is proposed to deeply explore characteristics of the COVID-19 pandemic in some parts of the United States (US). It merges as a new way to provide a systematic view and complementary information of COVID-19 progression in the US, enabling evidence-based responses towards pandemic intervention and prevention. To begin with, the Principal Component Analysis (PCA) varimax is adopted to fuse observed time-series data about the pandemic evolution in each state across the US. Then, relationships between the pandemic progress of two individual states are measured by different synchrony metrics, which can then be mapped into networks under unique topological characteristics. Lastly, the hidden knowledge in the established networks can be revealed from different perspectives by network structure measurement, community detection, and online random forest, which helps to inform data-driven decisions for battling the pandemic. It has been found that states gathered in the same community by diffusion entropy reducer (DER) are prone to be geographically close and share a similar pattern and tendency of COVID-19 evolution. Social factors regarding the political party, Gross Domestic Product (GDP), and population density are possible to be significantly associated with the two detected communities within a constructed network. Moreover, the cluster-specific predictor based on online random forest and sliding window is proven useful in dynamically capturing and predicting the epidemiological trends for each community, which can reach the highest.
Collapse
Affiliation(s)
- Yue Pan
- Shanghai Key Laboratory for Digital Maintenance of Buildings and Infrastructure, Department of Civil Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Limao Zhang
- School of Civil and Environmental Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
| | - Juliette Unwin
- MRC Centre for Global Infectious Disease Analysis, United Kingdom
| | - Miroslaw J Skibniewski
- Department of Civil and Environmental Engineering, University of Maryland, College Park, MD 20742-3021, USA
- Chaoyang University of Technology, 413310 Taichung, Taiwan
- Polish Academy of Sciences Institute of Theoretical and Applied Informatics, 44-100 Gliwice, Poland
| |
Collapse
|
11
|
Chew AWZ, Pan Y, Wang Y, Zhang L. Hybrid deep learning of social media big data for predicting the evolution of COVID-19 transmission. Knowl Based Syst 2021; 233:107417. [PMID: 34690447 PMCID: PMC8522122 DOI: 10.1016/j.knosys.2021.107417] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 07/14/2021] [Accepted: 08/18/2021] [Indexed: 11/22/2022]
Abstract
In this study, a hybrid deep-learning model termed as ODANN, built upon neural networks (NN) coupled with data assimilation and natural language processing (NLP) features extraction methods, has been constructed to concurrently process daily COVID-19 time-series records and large volumes of COVID-19 related Twitter data, as representative of the global community's aggregated emotional responses towards the current pandemic, to model the growth rate in the number of confirmed COVID-19 cases globally via a proposed G parameter. Overall, there were 3 key components to ODANN's development phase, namely: (i) data hydration and pre-processing were performed on COVID-19 related Twitter data ranging between 23 January 2020 and 10 May 2020, which amounted to over 100 million Tweets written in English language; (ii) multiple NLP features extraction methods were subsequently leveraged to encode the hydrated Twitter data into useful semantic word vectors for training ODANN under an optimal set of hyperparameters; and (iii) historical time-series data of defined characteristics were also assimilated into ODANN's selected hidden layer(s) to model the G parameter daily with a lead-time of 1 day. By far, our experimental results demonstrated that by adopting a rolling time-window size of 5 days, with respect to the number of historical time-series records for assimilating different data features, enabled ODANN to outperform other traditional time-series models and recent studies, in terms of the computed RMSE and MAE scores attained from the model's testing step. Overall, the summarized results from ODANN demonstrated its competitive edge in modelling and forecasting the growth rate in the number of COVID-19 cases globally.
Collapse
Affiliation(s)
- Alvin Wei Ze Chew
- Bentley Systems Research Office, 1 Harbourfront Pl, HarbourFront Tower One, Singapore 098633, Singapore
| | - Yue Pan
- Shanghai Key Laboratory for Digital Maintenance of Buildings and Infrastructure, Department of Civil Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, China
| | - Ying Wang
- School of Civil and Environmental Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
| | - Limao Zhang
- School of Civil and Environmental Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
| |
Collapse
|