1
|
Xiong X, Xiang L, Chang L, Wu IX, Deng S. Forecasting the Incidence of Mumps Based on the Baidu Index and Environmental Data in Yunnan, China: Deep Learning Model Study. J Med Internet Res 2025; 27:e66072. [PMID: 39913179 PMCID: PMC11843052 DOI: 10.2196/66072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 10/04/2024] [Accepted: 12/04/2024] [Indexed: 02/07/2025] Open
Abstract
BACKGROUND Mumps is a viral respiratory disease characterized by facial swelling and transmitted through respiratory secretions. Despite the availability of an effective vaccine, mumps outbreaks have reemerged globally, including in China, where it remains a significant public health issue. In Yunnan province, China, the incidence of mumps has fluctuated markedly and is higher than that in mainland China, underscoring the need for improved outbreak prediction methods. Traditional surveillance methods, however, may not be sufficient for timely and accurate outbreak prediction. OBJECTIVE Our study aims to leverage the Baidu search index, representing search volumes from China's most popular search engine, along with environmental data to develop a predictive model for mumps incidence in Yunnan province. METHODS We analyzed mumps incidence in Yunnan Province from 2014 to 2023, and used time series data, including mumps incidence, Baidu search index, and environmental factors, from 2016 to 2023, to develop predictive models based on long short-term memory networks. Feature selection was conducted using Pearson correlation analysis, and lag correlations were explored through a distributed nonlinear lag model (DNLM). We constructed four models with different combinations of predictors: (1) model BE, combining the Baidu index and environmental factors data; (2) model IB, combining mumps incidence and Baidu index data; (3) model IE, combining mumps incidence and environmental factors; and (4) model IBE, integrating all 3 data sources. RESULTS The incidence of mumps in Yunnan showed significant variability, peaking at 37.5 per 100,000 population in 2019. From 2014 to 2023, the proportion of female patients ranged from 41.3% in 2015 to 45.7% in 2020, consistently lower than that of male patients. After excluding variables with a Pearson correlation coefficient of <0.10 or P values of <.05, we included 3 Baidu index search term groups (disease name, symptoms, and treatment) and 6 environmental factors (maximum temperature, minimum temperature, sulfur dioxide, carbon monoxide, particulate matter with a diameter of 2.5 µm or less, and particulate matter with a diameter of 10 µm or less) for model development. DNLM analysis revealed that the relative risks consistently increased with rising Baidu index values, while nonlinear associations between temperature and mumps incidence were observed. Among the 4 models, model IBE exhibited the best performance, achieving the coefficient of determination of 0.72, with mean absolute error, mean absolute percentage error, and root-mean-square error values of 0.33, 15.9%, and 0.43, respectively, in the test set. CONCLUSIONS Our study developed model IBE to predict the incidence of mumps in Yunnan province, offering a potential tool for early detection of mumps outbreaks. The performance of model IBE underscores the potential of integrating search engine data and environmental factors to enhance mumps incidence forecasting. This approach offers a promising tool for improving public health surveillance and enabling rapid responses to mumps outbreaks.
Collapse
Affiliation(s)
- Xin Xiong
- Department of Epidemiology and Health Statistics, Xiangya School of Public Health, Central South University, Changsha, China
| | - Linghui Xiang
- Department of Epidemiology and Health Statistics, Xiangya School of Public Health, Central South University, Changsha, China
| | - Litao Chang
- Department of School Health, Yunnan Provincial Center for Disease Control and Prevention, Kunming, China
| | - Irene Xy Wu
- Department of Epidemiology and Health Statistics, Xiangya School of Public Health, Central South University, Changsha, China
- Hunan Provincial Key Laboratory of Clinical Epidemiology, Central South University, Changsha, China
| | - Shuzhen Deng
- Department of School Health, Yunnan Provincial Center for Disease Control and Prevention, Kunming, China
| |
Collapse
|
2
|
Luo J, Wang X, Fan X, He Y, Du X, Chen YQ, Zhao Y. A novel graph neural network based approach for influenza-like illness nowcasting: exploring the interplay of temporal, geographical, and functional spatial features. BMC Public Health 2025; 25:408. [PMID: 39893390 PMCID: PMC11786584 DOI: 10.1186/s12889-025-21618-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 01/24/2025] [Indexed: 02/04/2025] Open
Abstract
BACKGROUND Accurate and timely monitoring of influenza prevalence is essential for effective healthcare interventions. This study proposes a graph neural network (GNN)-based method to address the issue of cross-regional connectivity in predicting influenza outbreaks, aiming to achieve real-time and accurate influenza prediction. METHODS We proposed a GNN-based approach with dual topology processing, capturing both geographical and socio-economic associations among counties/cities. The model inputs consist of weekly matrices of influenza-like illness (ILI) rates at city level, along with geographical topology and functional topology. The model construction involves temporal feature extraction through 1-dimensional gated causal convolution, spatial feature embedding through graph convolution, and additional adjustments to enhance spatiotemporal interaction exploration. Evaluation metrics include four commonly used measures: root mean square error (RMSE), mean absolute percentage error (MAPE), mean absolute error (MAE), and Pearson correlation (Corr). RESULTS Our approach for predicting influenza outbreaks achieves competitive performance on real-world datasets (Corr = 0.8202; RMSE = 0.0017; MAE = 0.0013; MAPE = 0.0966), surpassing established baselines. Notably, our approach exhibits excellent capability in accurately and timely capturing short-term influenza outbreaks during the flu season, outperforming competitors across all evaluation metrics. CONCLUSION The incorporation of dual topology processing and the subsequent fusion mechanism allows the model to explore in-depth spatiotemporal feature interactions. Demonstrating superior performance, our approach shows great potential in early detection of flu trends for facilitating public health decisions and resource optimization.
Collapse
Affiliation(s)
- Jiajia Luo
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, Guangdong, China
| | - Xuan Wang
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, Guangdong, China
| | - Xiaomao Fan
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518118, Guangdong, China
| | - Yuxin He
- College of Urban Transportation and Logistics, Shenzhen Technology University, Shenzhen, 518118, Guangdong, China
| | - Xiangjun Du
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, Guangdong, China
| | - Yao-Qing Chen
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, Guangdong, China
| | - Yang Zhao
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, Guangdong, China.
| |
Collapse
|
3
|
Begashaw GB, Zewotir T, Fenta HM. A deep learning approach for classifying and predicting children's nutritional status in Ethiopia using LSTM-FC neural networks. BioData Min 2025; 18:11. [PMID: 39885567 PMCID: PMC11783927 DOI: 10.1186/s13040-025-00425-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Accepted: 01/17/2025] [Indexed: 02/01/2025] Open
Abstract
BACKGROUND This study employs a LSTM-FC neural networks to address the critical public health issue of child undernutrition in Ethiopia. By employing this method, the study aims classify children's nutritional status and predict transitions between different undernutrition states over time. This analysis is based on longitudinal data extracted from the Young Lives cohort study, which tracked 1,997 Ethiopian children across five survey rounds conducted from 2002 to 2016. This paper applies rigorous data preprocessing, including handling missing values, normalization, and balancing, to ensure optimal model performance. Feature selection was performed using SHapley Additive exPlanations to identify key factors influencing nutritional status predictions. Hyperparameter tuning was thoroughly applied during model training to optimize performance. Furthermore, this paper compares the performance of LSTM-FC with existing baseline models to demonstrate its superiority. We used Python's TensorFlow and Keras libraries on a GPU-equipped system for model training. RESULTS LSTM-FC demonstrated superior predictive accuracy and long-term forecasting compared to baseline models for assessing child nutritional status. The classification and prediction performance of the model showed high accuracy rates above 93%, with perfect predictions for Normal (N) and Stunted & Wasted (SW) categories, minimal errors in most other nutritional statuses, and slight over- or underestimations in a few instances. The LSTM-FC model demonstrates strong generalization performance across multiple folds, with high recall and consistent F1-scores, indicating its robustness in predicting nutritional status. We analyzed the prevalence of children's nutritional status during their transition from late adolescence to early adulthood. The results show a notable decline in normal nutritional status among males, decreasing from 58.3% at age 5 to 33.5% by age 25. At the same time, the risk of severe undernutrition, including conditions of being underweight, stunted, and wasted (USW), increased from 1.3% to 9.4%. CONCLUSIONS The LSTM-FC model outperforms baseline methods in classifying and predicting Ethiopian children's nutritional statuses. The findings reveal a critical rise in undernutrition, emphasizing the need for urgent public health interventions.
Collapse
Affiliation(s)
- Getnet Bogale Begashaw
- Department of Statistics, College of Science, Bahir Dar University, P.O. Box 79, Bahir Dar, Ethiopia.
- Department of Data Science, College of Natural and Computational Science, Debre Berhan University, P.O. Box 445, Debre Berhan, Ethiopia.
| | - Temesgen Zewotir
- School of Mathematics, Statistics and Computer Science, College of Agriculture, Engineering and Science, University of KwaZulu-Natal, Durban, South Africa
| | - Haile Mekonnen Fenta
- Department of Statistics, College of Science, Bahir Dar University, P.O. Box 79, Bahir Dar, Ethiopia
- Center for Environmental and Respiratory Health Research, Population Health, University of Oulu, Oulu, Finland
- Biocenter Oulu, University of Oulu, Oulu, Finland
| |
Collapse
|
4
|
Li G, Li Y, Han G, Jiang C, Geng M, Guo N, Wu W, Liu S, Xing Z, Han X, Li Q. Forecasting and analyzing influenza activity in Hebei Province, China, using a CNN-LSTM hybrid model. BMC Public Health 2024; 24:2171. [PMID: 39135162 PMCID: PMC11318307 DOI: 10.1186/s12889-024-19590-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 07/25/2024] [Indexed: 08/16/2024] Open
Abstract
BACKGROUND Influenza, an acute infectious respiratory disease, presents a significant global health challenge. Accurate prediction of influenza activity is crucial for reducing its impact. Therefore, this study seeks to develop a hybrid Convolution Neural Network-Long Short Term Memory neural network (CNN-LSTM) model to forecast the percentage of influenza-like-illness (ILI) rate in Hebei Province, China. The aim is to provide more precise guidance for influenza prevention and control measures. METHODS Using ILI% data from 28 national sentinel hospitals in the Hebei Province, spanning from 2010 to 2022, we employed the Python deep learning framework PyTorch to develop the CNN-LSTM model. Additionally, we utilized R and Python to develop four other models commonly used for predicting infectious diseases. After constructing the models, we employed these models to make retrospective predictions, and compared each model's prediction performance using mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and other evaluation metrics. RESULTS Based on historical ILI% data from 28 national sentinel hospitals in Hebei Province, the Seasonal Auto-Regressive Indagate Moving Average (SARIMA), Extreme Gradient Boosting (XGBoost), Convolution Neural Network (CNN), Long Short Term Memory neural network (LSTM) models were constructed. On the testing set, all models effectively predicted the ILI% trends. Subsequently, these models were used to forecast over different time spans. Across various forecasting periods, the CNN-LSTM model demonstrated the best predictive performance, followed by the XGBoost model, LSTM model, CNN model, and SARIMA model, which exhibited the least favorable performance. CONCLUSION The hybrid CNN-LSTM model had better prediction performances than the SARIMA model, CNN model, LSTM model, and XGBoost model. This hybrid model could provide more accurate influenza activity projections in the Hebei Province.
Collapse
Affiliation(s)
- Guofan Li
- School of Public Health, Hebei Medical University, No.361, Zhongshan East Road, Shijiazhuang, Hebei Province, 050017, China
| | - Yan Li
- Hebei Provincial Center for Disease Control and Prevention, No.97, Huai'an East Road, Shijiazhuang, Hebei Province, 050021, China
| | - Guangyue Han
- Hebei Provincial Center for Disease Control and Prevention, No.97, Huai'an East Road, Shijiazhuang, Hebei Province, 050021, China
| | - Caixiao Jiang
- Hebei Provincial Center for Disease Control and Prevention, No.97, Huai'an East Road, Shijiazhuang, Hebei Province, 050021, China
| | - Minghao Geng
- Hebei Provincial Center for Disease Control and Prevention, No.97, Huai'an East Road, Shijiazhuang, Hebei Province, 050021, China
| | - Nana Guo
- Hebei Provincial Center for Disease Control and Prevention, No.97, Huai'an East Road, Shijiazhuang, Hebei Province, 050021, China
| | - Wentao Wu
- Hebei Provincial Center for Disease Control and Prevention, No.97, Huai'an East Road, Shijiazhuang, Hebei Province, 050021, China
| | - Shangze Liu
- Hebei Provincial Center for Disease Control and Prevention, No.97, Huai'an East Road, Shijiazhuang, Hebei Province, 050021, China
| | - Zhihuai Xing
- Hebei Provincial Center for Disease Control and Prevention, No.97, Huai'an East Road, Shijiazhuang, Hebei Province, 050021, China
| | - Xu Han
- Hebei Provincial Center for Disease Control and Prevention, No.97, Huai'an East Road, Shijiazhuang, Hebei Province, 050021, China
| | - Qi Li
- School of Public Health, Hebei Medical University, No.361, Zhongshan East Road, Shijiazhuang, Hebei Province, 050017, China.
- Hebei Provincial Center for Disease Control and Prevention, No.97, Huai'an East Road, Shijiazhuang, Hebei Province, 050021, China.
| |
Collapse
|
5
|
Chen Q, Zheng X, Shi H, Zhou Q, Hu H, Sun M, Xu Y, Zhang X. Prediction of influenza outbreaks in Fuzhou, China: comparative analysis of forecasting models. BMC Public Health 2024; 24:1399. [PMID: 38796443 PMCID: PMC11127308 DOI: 10.1186/s12889-024-18583-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Accepted: 04/12/2024] [Indexed: 05/28/2024] Open
Abstract
BACKGROUND Influenza is a highly contagious respiratory disease that presents a significant challenge to public health globally. Therefore, effective influenza prediction and prevention are crucial for the timely allocation of resources, the development of vaccine strategies, and the implementation of targeted public health interventions. METHOD In this study, we utilized historical influenza case data from January 2013 to December 2021 in Fuzhou to develop four regression prediction models: SARIMA, Prophet, Holt-Winters, and XGBoost models. Their predicted performance was assessed by using influenza data from the period from January 2022 to December 2022 in Fuzhou. These models were used for fitting and prediction analysis. The evaluation metrics, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE), were employed to compare the performance of these models. RESULTS The results indicate that the epidemic of influenza in Fuzhou exhibits a distinct seasonal and cyclical pattern. The influenza cases data displayed a noticeable upward trend and significant fluctuations. In our study, we employed SARIMA, Prophet, Holt-Winters, and XGBoost models to predict influenza outbreaks in Fuzhou. Among these models, the XGBoost model demonstrated the best performance on both the training and test sets, yielding the lowest values for MSE, RMSE, and MAE among the four models. CONCLUSION The utilization of the XGBoost model significantly enhances the prediction accuracy of influenza in Fuzhou. This study makes a valuable contribution to the field of influenza prediction and provides substantial support for future influenza response efforts.
Collapse
Affiliation(s)
- Qingquan Chen
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China
| | - Xiaoyan Zheng
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China
| | - Huanhuan Shi
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China
| | - Quan Zhou
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China
| | - Haiping Hu
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China
| | - Mengcai Sun
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China
| | - Youqiong Xu
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China.
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China.
| | - Xiaoyang Zhang
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China.
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China.
| |
Collapse
|
6
|
Shao J, Pan Y, Kou WB, Feng H, Zhao Y, Zhou K, Zhong S. Generalization of a Deep Learning Model for Continuous Glucose Monitoring-Based Hypoglycemia Prediction: Algorithm Development and Validation Study. JMIR Med Inform 2024; 12:e56909. [PMID: 38801705 PMCID: PMC11148841 DOI: 10.2196/56909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 04/07/2024] [Accepted: 05/04/2024] [Indexed: 05/29/2024] Open
Abstract
Background Predicting hypoglycemia while maintaining a low false alarm rate is a challenge for the wide adoption of continuous glucose monitoring (CGM) devices in diabetes management. One small study suggested that a deep learning model based on the long short-term memory (LSTM) network had better performance in hypoglycemia prediction than traditional machine learning algorithms in European patients with type 1 diabetes. However, given that many well-recognized deep learning models perform poorly outside the training setting, it remains unclear whether the LSTM model could be generalized to different populations or patients with other diabetes subtypes. Objective The aim of this study was to validate LSTM hypoglycemia prediction models in more diverse populations and across a wide spectrum of patients with different subtypes of diabetes. Methods We assembled two large data sets of patients with type 1 and type 2 diabetes. The primary data set including CGM data from 192 Chinese patients with diabetes was used to develop the LSTM, support vector machine (SVM), and random forest (RF) models for hypoglycemia prediction with a prediction horizon of 30 minutes. Hypoglycemia was categorized into mild (glucose=54-70 mg/dL) and severe (glucose<54 mg/dL) levels. The validation data set of 427 patients of European-American ancestry in the United States was used to validate the models and examine their generalizations. The predictive performance of the models was evaluated according to the sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). Results For the difficult-to-predict mild hypoglycemia events, the LSTM model consistently achieved AUC values greater than 97% in the primary data set, with a less than 3% AUC reduction in the validation data set, indicating that the model was robust and generalizable across populations. AUC values above 93% were also achieved when the LSTM model was applied to both type 1 and type 2 diabetes in the validation data set, further strengthening the generalizability of the model. Under different satisfactory levels of sensitivity for mild and severe hypoglycemia prediction, the LSTM model achieved higher specificity than the SVM and RF models, thereby reducing false alarms. Conclusions Our results demonstrate that the LSTM model is robust for hypoglycemia prediction and is generalizable across populations or diabetes subtypes. Given its additional advantage of false-alarm reduction, the LSTM model is a strong candidate to be widely implemented in future CGM devices for hypoglycemia prediction.
Collapse
Affiliation(s)
- Jian Shao
- Guangzhou Laboratory, Guangzhou, China
| | - Ying Pan
- Department of Endocrinology, Kunshan Hospital Affiliated to Jiangsu University, Kunshan, China
| | - Wei-Bin Kou
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China
| | - Huyi Feng
- Chongqing Fifth People’s Hospital, Chongqing, China
| | - Yu Zhao
- Guangzhou Laboratory, Guangzhou, China
| | | | - Shao Zhong
- Department of Endocrinology, Kunshan Hospital Affiliated to Jiangsu University, Kunshan, China
| |
Collapse
|
7
|
Varela-Lasheras I, Perfeito L, Mesquita S, Gonçalves-Sá J. The effects of weather and mobility on respiratory viruses dynamics before and during the COVID-19 pandemic in the USA and Canada. PLOS DIGITAL HEALTH 2023; 2:e0000405. [PMID: 38127792 PMCID: PMC10734953 DOI: 10.1371/journal.pdig.0000405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/07/2023] [Indexed: 12/23/2023]
Abstract
The flu season is caused by a combination of different pathogens, including influenza viruses (IVS), that cause the flu, and non-influenza respiratory viruses (NIRVs), that cause common colds or influenza-like illness. These viruses exhibit similar dynamics and meteorological conditions have historically been regarded as a principal modulator of their epidemiology, with outbreaks in the winter and almost no circulation during the summer, in temperate regions. However, after the emergence of SARS-CoV2, in late 2019, the dynamics of these respiratory viruses were strongly perturbed worldwide: some infections displayed near-eradication, while others experienced temporal shifts or occurred "off-season". This disruption raised questions regarding the dominant role of weather while also providing an unique opportunity to investigate the roles of different determinants on the epidemiological dynamics of IVs and NIRVs. Here, we employ statistical analysis and modelling to test the effects of weather and mobility in viral dynamics, before and during the COVID-19 pandemic. Leveraging epidemiological surveillance data on several respiratory viruses, from Canada and the USA, from 2016 to 2023, we found that whereas in the pre-COVID-19 pandemic period, weather had a strong effect, in the pandemic period the effect of weather was strongly reduced and mobility played a more relevant role. These results, together with previous studies, indicate that behavioral changes resulting from the non-pharmacological interventions implemented to control SARS-CoV2, interfered with the dynamics of other respiratory viruses, and that the past dynamical equilibrium was disturbed, and perhaps permanently altered, by the COVID-19 pandemic.
Collapse
Affiliation(s)
- Irma Varela-Lasheras
- Nova School of Business and Economics, Universidade Nova de Lisboa, Carcavelos, Portugal
| | - Lilia Perfeito
- LIP, Laboratório de Instrumentação e Física Experimental de Partículas, Lisbon, Portugal
| | - Sara Mesquita
- LIP, Laboratório de Instrumentação e Física Experimental de Partículas, Lisbon, Portugal
- Nova Medical School, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Joana Gonçalves-Sá
- Nova School of Business and Economics, Universidade Nova de Lisboa, Carcavelos, Portugal
- LIP, Laboratório de Instrumentação e Física Experimental de Partículas, Lisbon, Portugal
| |
Collapse
|
8
|
Zarkogianni K, Dervakos E, Filandrianos G, Ganitidis T, Gkatzou V, Sakagianni A, Raghavendra R, Max Nikias CL, Stamou G, Nikita KS. The smarty4covid dataset and knowledge base as a framework for interpretable physiological audio data analysis. Sci Data 2023; 10:770. [PMID: 37932314 PMCID: PMC10628219 DOI: 10.1038/s41597-023-02646-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 10/10/2023] [Indexed: 11/08/2023] Open
Abstract
Harnessing the power of Artificial Intelligence (AI) and m-health towards detecting new bio-markers indicative of the onset and progress of respiratory abnormalities/conditions has greatly attracted the scientific and research interest especially during COVID-19 pandemic. The smarty4covid dataset contains audio signals of cough (4,676), regular breathing (4,665), deep breathing (4,695) and voice (4,291) as recorded by means of mobile devices following a crowd-sourcing approach. Other self reported information is also included (e.g. COVID-19 virus tests), thus providing a comprehensive dataset for the development of COVID-19 risk detection models. The smarty4covid dataset is released in the form of a web-ontology language (OWL) knowledge base enabling data consolidation from other relevant datasets, complex queries and reasoning. It has been utilized towards the development of models able to: (i) extract clinically informative respiratory indicators from regular breathing records, and (ii) identify cough, breath and voice segments in crowd-sourced audio recordings. A new framework utilizing the smarty4covid OWL knowledge base towards generating counterfactual explanations in opaque AI-based COVID-19 risk detection models is proposed and validated.
Collapse
Affiliation(s)
- Konstantia Zarkogianni
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens, 157 80, Greece.
- Maastricht University, Faculty of Science and Engineering, Department of Advanced Computing Sciences, Maastricht, 6200 MD, Netherlands.
| | - Edmund Dervakos
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens, 157 80, Greece
| | - George Filandrianos
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens, 157 80, Greece
| | - Theofanis Ganitidis
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens, 157 80, Greece
| | - Vasiliki Gkatzou
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens, 157 80, Greece
| | - Aikaterini Sakagianni
- Sismanoglion General Hospital, Department of Intensive Care Unit, Athens, 15126, Greece
| | - Raghu Raghavendra
- University of Southern California, Viterbi School of Engineering, Los Angeles, 90089, USA
| | - C L Max Nikias
- University of Southern California, Viterbi School of Engineering, Los Angeles, 90089, USA
| | - Giorgos Stamou
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens, 157 80, Greece
| | - Konstantina S Nikita
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens, 157 80, Greece
- University of Southern California, Viterbi School of Engineering, Los Angeles, 90089, USA
| |
Collapse
|