1
|
Yasin KH, Yasin MI, Iguala AD, Gelete TB, Kebede E. Methodological Integration of Machine Learning and Geospatial Analysis for PM 10 Pollution Mapping. MethodsX 2025; 14:103322. [PMID: 40331028 PMCID: PMC12051153 DOI: 10.1016/j.mex.2025.103322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2025] [Accepted: 04/16/2025] [Indexed: 05/08/2025] Open
Abstract
Air pollution mitigation necessitates accurate spatial modelling to inform public health interventions. Traditional approaches inadequately capture complex predictor-pollutant interactions, whereas machine learning (ML) offers a superior capacity for modelling nonlinear relationships. This study compares three ML Random Forest (RF), K-Nearest Neighbors (KNN), and Naïve Bayes (NB) algorithms using annual PM10 data from 11 monitoring stations alongside atmospheric, urban, and terrain covariates. The methodological framework employed rigorous preprocessing and cross-validation to classify pollution into three categorical levels. Results demonstrate RF superior performance, achieving 94% balanced accuracy and 97% specificity, significantly outperforming KNN (92%) and NB (89%). RF excelled in capturing spatial heterogeneity and complex variable interactions, while KNN and NB exhibited limitations in managing feature dependencies and localized variability. Despite computational demands, findings substantiate RF reliability for robust air quality monitoring applications. The study contributes valuable insights for implementing scalable pollution prediction systems in resource-constrained urban environments while acknowledging interpretability challenges inherent to complex ML models.•Preprocessing of spatial data from various sources, incorporating the handling of missing/abnormal data, analysis, and normalization•Implementation of the three ML algorithms with rigorous hyperparameter tuning, model validation, and performance assessment•Mapping PM10 Hotspots on the Gradient Direction and Distance from the City Center.
Collapse
Affiliation(s)
- Kalid Hassen Yasin
- Geo-Information Science Program, School of Geography and Environmental Studies, Haramaya University, P.O. Box 138, 3220 Dire Dawa, Ethiopia
| | - Muaz Ismael Yasin
- School of Medicine, College of Health and Medical Sciences, Haramaya University, P.O. Box 235, Harar, Ethiopia
| | | | - Tadele Bedo Gelete
- Geo-Information Science Program, School of Geography and Environmental Studies, Haramaya University, P.O. Box 138, 3220 Dire Dawa, Ethiopia
| | - Erana Kebede
- School of Plant Sciences, College of Agriculture and Environmental Sciences, Haramaya University, P.O. Box 138, Dire Dawa, Ethiopia
| |
Collapse
|
2
|
Lu QO, Lee CC. Innovative Geo-AI model: An enhance of outdoor PM estimations based on land use and outdoor environmental factors in a highly polluted area. CHEMOSPHERE 2025; 373:144178. [PMID: 39908842 DOI: 10.1016/j.chemosphere.2025.144178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Revised: 01/02/2025] [Accepted: 01/28/2025] [Indexed: 02/07/2025]
Abstract
Particulate matter (PM) is a critical component of overall pollutant exposure, but monitoring at the individual level remains impractical for large cohorts. This study aimed to identify PM sources in a highly polluted area in Taiwan and develop generalizable predictive models. We collected daily average PM data from Environmental Protection Administration (EPA) air quality monitoring stations, AirBox sensors, and EPA micro-stations in highly polluted area of Taiwan, recorded between 2018 and 2020. Predictors were derived from various datasets, including EPA environmental resources, meteorological data, land use, road traffic facilities, social information, geospatial data, and landmark databases. Employing ensemble techniques, such as land-use regression (LUR), inverse distance weighting, and three machine learning algorithms (support vector machine, random forest, and multilayer perceptron), we predicted PM2.5 and PM10 levels. The selection of important variables involved Spearman's and Kendall's Tau correlation analyses, along with stepwise regression. The optimal outdoor predictive model developed herein was an ensemble with R2 values of 0.89 for PM2.5 and 0.87 for PM10. Such models may be effective for estimating individual PM exposure in epidemiological studies and serve as a framework for other countries. Notably, our study pioneers the application of LUR models in Southern Taiwan, enriching the general prediction of atmospheric pollutant distributions. This research provides a scientific basis for urban planning, air pollution management, public health policy, and potential early warning strategies.
Collapse
Affiliation(s)
- Quang-Oai Lu
- Department of Environmental and Occupational Health, College of Medicine, National Cheng Kung University, Tainan, 704, Taiwan
| | - Ching-Chang Lee
- Department of Environmental and Occupational Health, College of Medicine, National Cheng Kung University, Tainan, 704, Taiwan; Research Center of Environmental Trace Toxic Substances, College of Medicine, National Cheng Kung University, Tainan, 704, Taiwan.
| |
Collapse
|
3
|
Liu Z, Huang X, Wang X. PM 2.5 prediction based on modified whale optimization algorithm and support vector regression. Sci Rep 2024; 14:23296. [PMID: 39375472 PMCID: PMC11458793 DOI: 10.1038/s41598-024-74122-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Accepted: 09/24/2024] [Indexed: 10/09/2024] Open
Abstract
In order to obtain the pattern of variation of PM2.5concentrations in the atmosphere in Nanchang City, we build a Support Vector Regression(SVR) with modified Whale Optimization Algorithm(WOA) hybrid model (namely mWOA-SVR model) that can predict the PM2.5concentration. Firstly, according to the Pearson correlation coefficient (PCC) method to examine the dynamic relationship between air pollutants and meteorological factors together with them, PM10, SO2and CO were selected as air pollutant concentration characteristics, while daily maximum and minimum temperatures, and wind power levels were selected as meteorological characteristics; then, using modified WOA algorithm for parameter selection of SVR model, four sets of better parameter combinations were found; finally, the mWOA-SVR model was built by the four sets parameters to predict PM2.5concentration. The results show that the prediction accuracy of mixed mWOA-SVR model with pollutant concentration plus weather factors as the feature was higher than single pollutant concentration.
Collapse
Affiliation(s)
- Zuhan Liu
- School of Information Engineering, Nanchang Institute of Technology, Nanchang, 330099, China.
- Jiangxi Province Key Laboratory of Smart Water Conservancy, Nanchang, 330099, China.
| | - Xin Huang
- School of Information Engineering, Nanchang Institute of Technology, Nanchang, 330099, China
| | - Xing Wang
- School of Information Engineering, Nanchang Institute of Technology, Nanchang, 330099, China
| |
Collapse
|
4
|
Rautela KS, Goyal MK. Transforming air pollution management in India with AI and machine learning technologies. Sci Rep 2024; 14:20412. [PMID: 39223178 PMCID: PMC11369276 DOI: 10.1038/s41598-024-71269-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024] Open
Abstract
A comprehensive approach is essential in India's ongoing battle against air pollution, combining technological advancements, regulatory reinforcement, and widespread societal engagement. Bridging technological gaps involves deploying sophisticated pollution control technologies and addressing the rural-urban disparity through innovative solutions. The review found that integrating Artificial Intelligence and Machine Learning (AI&ML) in air quality forecasting demonstrates promising results with a remarkable model efficiency. In this study, initially, we compute the PM2.5 concentration over India using a surface mass concentration of 5 key aerosols such as black carbon (BC), dust (DU), organic carbon (OC), sea salt (SS) and sulphates (SU), respectively. The study identifies several regions highly vulnerable to PM2.5 pollution due to specific sources. The Indo-Gangetic Plains are notably impacted by high concentrations of BC, OC, and SU resulting from anthropogenic activities. Western India experiences higher DU concentrations due to its proximity to the Sahara Desert. Additionally, certain areas in northeast India show significant contributions of OC from biogenic activities. Moreover, an AI&ML model based on convolutional autoencoder architecture underwent rigorous training, testing, and validation to forecast PM2.5 concentrations across India. The results reveal its exceptional precision in PM2.5 prediction, as demonstrated by model evaluation metrics, including a Structural Similarity Index exceeding 0.60, Peak Signal-to-Noise Ratio ranging from 28-30 dB and Mean Square Error below 10 μg/m3. However, regulatory challenges persist, necessitating robust frameworks and consistent enforcement mechanisms, as evidenced by the complexities in predicting PM2.5 concentrations. Implementing tailored regional pollution control strategies, integrating AI&ML technologies, strengthening regulatory frameworks, promoting sustainable practices, and encouraging international collaboration are essential policy measures to mitigate air pollution in India.
Collapse
Affiliation(s)
- Kuldeep Singh Rautela
- Department of Civil Engineering, Indian Institute of Technology Indore, Simrol, Indore, 453552, Madhya Pradesh, India
| | - Manish Kumar Goyal
- Department of Civil Engineering, Indian Institute of Technology Indore, Simrol, Indore, 453552, Madhya Pradesh, India.
| |
Collapse
|
5
|
Gündoğdu S, Elbir T. Elevating hourly PM 2.5 forecasting in Istanbul, Türkiye: Leveraging ERA5 reanalysis and genetic algorithms in a comparative machine learning model analysis. CHEMOSPHERE 2024; 364:143096. [PMID: 39146993 DOI: 10.1016/j.chemosphere.2024.143096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 08/07/2024] [Accepted: 08/13/2024] [Indexed: 08/17/2024]
Abstract
Rapid urbanization and industrialization have intensified air pollution, posing severe health risks and necessitating accurate PM2.5 predictions for effective urban air quality management. This study distinguishes itself by utilizing high-resolution ERA5 reanalysis data for a grid-based spatial analysis of Istanbul, Türkiye, a densely populated city with diverse pollutant sources. It assesses the predictive accuracy of advanced machine learning (ML) models-Multiple Linear Regression (MLR), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting (LGB), Random Forest (RF), and Nonlinear Autoregressive with Exogenous Inputs (NARX). Notably, it introduces genetic algorithm optimization for the NARX model to enhance its performance. The models were trained on hourly PM2.5 concentrations from twenty monitoring stations across 2020-2021. Istanbul was divided into seven regions based on ERA5 grid distributions to examine PM2.5 spatial variability. Seventeen input variables from ERA5, including meteorological, land cover, and vegetation parameters, were analyzed using the Neighborhood Component Analysis (NCA) method to identify the most predictive variables. Comparative analysis showed that while all models provided valuable insights (RF > LGB > XGB > MLR), the NARX model outperformed them, particularly with the complex dataset used. The NARX model achieved a high R-value (0.89), low RMSE (5.24 μg/m³), and low MAE (2.94 μg/m³). It performed best in autumn and winter, with the highest accuracy in Region-1 (R-value 0.94) and the lowest in Region-5 (R-value 0.75). This study's success in a complex urban setting with limited monitoring underscores the robustness of the NARX model and the methodology's potential for global application in similar urban contexts. By addressing temporal and spatial variability in air quality predictions, this research sets a new benchmark and highlights the importance of advanced data analysis techniques for developing targeted pollution control strategies and public health policies.
Collapse
Affiliation(s)
- Serdar Gündoğdu
- Department of Computer Technologies, Bergama Vocational School, Dokuz Eylul University, Bergama, Izmir, 35700, Türkiye.
| | - Tolga Elbir
- Department of Environmental Engineering, Faculty of Engineering, Dokuz Eylul University, Buca, Izmir, 35390, Türkiye; Dokuz Eylul University, Environmental Research and Application Center (ÇEVMER), Tinaztepe Campus, 35390, Buca, Izmir, Türkiye.
| |
Collapse
|
6
|
Mutlu A, Aydın Keskin G, Çıldır İ. Predicting hospital admissions for upper respiratory tract complaints: An artificial neural network approach integrating air pollution and meteorological factors. ENVIRONMENTAL MONITORING AND ASSESSMENT 2024; 196:759. [PMID: 39046576 DOI: 10.1007/s10661-024-12908-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 07/11/2024] [Indexed: 07/25/2024]
Abstract
This study uses artificial neural networks (ANNs) to examine the intricate relationship between air pollutants, meteorological factors, and respiratory disorders. The study investigates the correlation between hospital admissions for respiratory diseases and the levels of PM10 and SO2 pollutants, as well as local meteorological conditions, using data from 2017 to 2019. The objective of this study is to clarify the impact of air pollution on the well-being of the general population, specifically focusing on respiratory ailments. An ANN called a multilayer perceptron (MLP) was used. The network was trained using the Levenberg-Marquardt (LM) backpropagation algorithm. The data revealed a substantial increase in hospital admissions for upper respiratory tract diseases, amounting to a total of 11,746 cases. There were clear seasonal fluctuations, with fall having the highest number of cases of bronchitis (N = 181), sinusitis (N = 83), and upper respiratory infections (N = 194). The study also found demographic differences, with females and people aged 18 to 65 years having greater admission rates. The performance of the ANN model, measured using R2 values, demonstrated a high level of predictive accuracy. Specifically, the R2 value was 0.91675 during training, 0.99182 during testing, and 0.95287 for validating the prediction of asthma. The comparative analysis revealed that the ANN-MLP model provided the most optimal result. The results emphasize the effectiveness of ANNs in representing the complex relationships between air quality, climatic conditions, and respiratory health. The results offer crucial insights for formulating focused healthcare policies and treatments to alleviate the detrimental impact of air pollution and meteorological factors.
Collapse
Affiliation(s)
- Atilla Mutlu
- Department of Environmental Engineering, College of Engineering, Balikesir University, Balikesir, Turkey.
| | - Gülşen Aydın Keskin
- Department of Industrial Engineering, College of Engineering, Balikesir University, Balikesir, Turkey
| | - İhsan Çıldır
- Ministry of Health Edremit State Hospital, Edremit, Balikesir, Turkey
| |
Collapse
|
7
|
Orhan N. Predicting deep well pump performance with machine learning methods during hydraulic head changes. Heliyon 2024; 10:e31505. [PMID: 38828352 PMCID: PMC11140612 DOI: 10.1016/j.heliyon.2024.e31505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 05/14/2024] [Accepted: 05/16/2024] [Indexed: 06/05/2024] Open
Abstract
In this study, machine learning techniques were employed to estimate and predict the system efficiency of a pumping plant at various hydraulic head levels. The measured parameters, including flow rate, outlet pressure, drawdown, and power, were used for estimating the system efficiency. Two approaches, Approach-I and Approach-II, were utilized. Approach-I incorporated additional parameters such as hydraulic head, drawdown, flow, power, and outlet pressure, while Approach-II focused solely on hydraulic head, outlet pressure, and power. Seven machine learning algorithms were employed to model and predict the efficiency of the pumping plant. The decrease in the hydraulic head by 125 cm resulted in a reduction in the pump system efficiency by 6.45 %, 8.94 %, and 13.8 % at flow rates of 40, 50, and 60 m3 h-1, respectively. Among the algorithms used in Approach-I, the artificial neural network, support vector machine regression, and lasso regression exhibited the highest performance, with R2 values of 0.995, 0.987, and 0.985, respectively. The corresponding RMSE values for these algorithms were 0.13 %, 0.23 %, and 0.22 %, while the MAE values were 0.11 %, 0.2 %, and 0.32 %, and the MAPE values were 0.22 %, 0.5 %, and 0.46.% In Approach-II, the artificial neural network model once again demonstrated the best performance with an R2 value of 0.996, followed by the support vector machine regression (R2 = 0.988) and the decision tree regression (R2 = 0.981). Overall, the artificial neural network model proved to be the most effective in both approaches. These findings highlight the potential of machine learning techniques in predicting the efficiency of pumping plant systems.
Collapse
Affiliation(s)
- Nuri Orhan
- Selçuk University, Faculty of Agriculture, Department of Agricultural Machinery and Technology Engineering, 42140, Konya, Turkiye
| |
Collapse
|
8
|
Yin Y, Ahmadianfar I, Karim FK, Elmannai H. Advanced forecasting of COVID-19 epidemic: Leveraging ensemble models, advanced optimization, and decomposition techniques. Comput Biol Med 2024; 175:108442. [PMID: 38678939 DOI: 10.1016/j.compbiomed.2024.108442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 03/25/2024] [Accepted: 04/07/2024] [Indexed: 05/01/2024]
Abstract
In the global effort to address the outbreak of the new coronavirus pneumonia (COVID-19) pandemic, accurate forecasting of epidemic patterns has become crucial for implementing successful interventions aimed at preventing and controlling the spread of the disease. The correct prediction of the course of COVID-19 outbreaks is a complex and challenging task, mainly because of the significant volatility in the data series related to COVID-19. Previous studies have been limited by the exclusive use of individual forecasting techniques in epidemic modeling, disregarding the integration of diverse prediction procedures. The lack of attention to detail in this situation can yield worse-than-ideal results. Consequently, this study introduces a novel ensemble framework that integrates three machine learning methods (kernel ridge regression (KRidge), Deep random vector functional link (dRVFL), and ridge regression) within a linear relationship (L-KRidge-dRVFL-Ridge). The optimization of this framework is accomplished through a distinctive approach, specifically adaptive differential evolution and particle swarm optimization (A-DEPSO). Moreover, an effective decomposition method, known as time-varying filter empirical mode decomposition (TVF-EMD), is employed to decompose the input variables. A feature selection technique, specifically using the light gradient boosting machine (LGBM), is also implemented to extract the most influential input variables. The daily datasets of COVID-19 collected from two countries, namely Italy and Poland, were used as the experimental examples. Additionally, all models are implemented to forecast COVID-19 at two-time horizons: 10- and 14-day ahead (t+10 and t+14). According to the results, the proposed model can yield higher correlation coefficient (R) for both case studies: Italy (t+10 = 0.965, t+14 = 0.961) and Poland (t+10 = 0.952, t+14 = 0.940) than the other models. The experimental results demonstrate that the model suggested in this paper has outstanding results in various kinds of complex epidemic prediction situations. The proposed ensemble model demonstrates exceptional accuracy and resilience, outperforming all similar models in terms of efficacy.
Collapse
Affiliation(s)
- Yingyu Yin
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, China.
| | - Iman Ahmadianfar
- Information and Communication Technology Research Group, Scientific Research Center, Al-Ayen University, Thi-Qar, Nasiriyah, 64001, Iraq.
| | - Faten Khalid Karim
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O.BOX 84428, Riyadh 11671, Saudi Arabia.
| | - Hela Elmannai
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O.BOX 84428, Riyadh 11671, Saudi Arabia.
| |
Collapse
|
9
|
Wiora A, Wiora J, Kasprzyk J. Indication Variability of the Particulate Matter Sensors Dependent on Their Location. SENSORS (BASEL, SWITZERLAND) 2024; 24:1683. [PMID: 38475219 PMCID: PMC10935032 DOI: 10.3390/s24051683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 02/20/2024] [Accepted: 03/01/2024] [Indexed: 03/14/2024]
Abstract
Particulate matter (PM) suspended in the air significantly impacts human health. Those of anthropogenic origin are particularly hazardous. Poland is one of the countries where the air quality during the heating season is the worst in Europe. Air quality in small towns and villages far from state monitoring stations is often much worse than in larger cities where they are located. Their residents inhale the air containing smoke produced mainly by coal-fired stoves. In the frame of this project, an air quality monitoring network was built. It comprises low-cost PMS7003 PM sensors and ESP8266 microcontrollers with integrated Wi-Fi communication modules. This article presents research results on the influence of the PM sensor location on their indications. It has been shown that the indications from sensors several dozen meters away from each other can differ by up to tenfold, depending on weather conditions and the source of smoke. Therefore, measurements performed by a network of sensors, even of worse quality, are much more representative than those conducted in one spot. The results also indicated the method of detecting a sudden increase in air pollutants. In the case of smokiness, the difference between the mean and median indications of the PM sensor increases even up to 400 µg/m3 over a 5 min time window. Information from this comparison suggests a sudden deterioration in air quality and can allow for quick intervention to protect people's health. This method can be used in protection systems where fast detection of anomalies is necessary.
Collapse
Affiliation(s)
| | | | - Jerzy Kasprzyk
- Department of Measurements and Control Systems, Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland; (A.W.); (J.W.)
| |
Collapse
|
10
|
Wang S, McGibbon J, Zhang Y. Predicting high-resolution air quality using machine learning: Integration of large eddy simulation and urban morphology data. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 344:123371. [PMID: 38266694 DOI: 10.1016/j.envpol.2024.123371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 01/15/2024] [Accepted: 01/15/2024] [Indexed: 01/26/2024]
Abstract
Accurately predicting air pollutants, especially in urban areas with well-defined spatial structures, is crucial. Over the past decade, machine learning techniques have been widely used to forecast urban air quality. However, traditional machine learning approaches have limitations in accuracy and interpretability for predicting pollutants. In this study, we propose a convolutional neural network (CNN) model to predict the spatial distribution of CO concentration in Nanjing urban area at 10 m resolution. Our model incorporates various factors as input, such as building height, topography, emissions, and is trained against the outputs simulated by the parallelized large-eddy simulation model (PALM). The PALM model has 48 different scenarios that varied in emissions, wind speeds, and wind directions. The results display a strong consistency between the two models. Furthermore, we evaluate the performance of our model using a 10-fold cross-validation and out-of-sample cross-validation approach. This yields a robust correlation (with both R2 > 0.8) and a low RMSE between the CO predicted by the PALM and CNN models, which demonstrates the generalization capability of our CNN model. The CNN can extract crucial features from the resulted weight contribution map. This map indicates that the CO concentration at a location is more influenced by nearby buildings and emissions than distant ones. The interpretable patterns uncovered by our model are related to neighborhood effects, wind speeds, directions, and the impact of orientation on urban CO distribution. The model also shows high prediction accuracy (R > 0.8) when applied to another city. Overall, the integration of our CNN framework with the PALM model enhances the accuracy of air quality predictions, while enabling a fluid dynamic laws interpretation, providing effective tools for air quality management.
Collapse
Affiliation(s)
- Shibao Wang
- School of Atmospheric Sciences, Nanjing University, Nanjing, Jiangsu, China
| | | | - Yanxu Zhang
- School of Atmospheric Sciences, Nanjing University, Nanjing, Jiangsu, China.
| |
Collapse
|
11
|
Hongliang G, Zhiyao Z, Ahmadianfar I, Escorcia-Gutierrez J, Aljehane NO, Li C. Multi-step influenza forecasting through singular value decomposition and kernel ridge regression with MARCOS-guided gradient-based optimization. Comput Biol Med 2024; 169:107888. [PMID: 38157778 DOI: 10.1016/j.compbiomed.2023.107888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/28/2023] [Accepted: 12/18/2023] [Indexed: 01/03/2024]
Abstract
This research delves into the significance of influenza outbreaks in public health, particularly the importance of accurate forecasts using weekly Influenza-like illness (ILI) rates. The present work develops a novel hybrid machine-learning model by combining singular value decomposition with kernel ridge regression (SKRR). In this context, a novel hybrid model known as H-SKRR is developed by combining two robust forecasting approaches, SKRR and ridge regression, which aims to improve multi-step-ahead predictions for weekly ILI rates in Southern and Northern China. The study begins with feature selection via XGBoost in the preprocessing phase, identifying optimal precursor information guided by importance factors. It decomposes the original signal using multivariate variational mode decomposition (MVMD) to address non-stationarity and complexity. H-SKRR is implemented by incorporating significant lagged-time components across sub-components. The aggregated forecasted values from these sub-components generate ILI values for two horizons (i.e., 4-and 7-weekly ahead). Employing the gradient-based optimization (GBO) algorithm fine-tunes model parameters. Furthermore, the deep random vector functional link (dRVFL), Ridge regression, and gated recurrent unit neural network (GRU) models were employed to validate the MVMD-H-SKRR-GBO paradigm's effectiveness. The outcomes, assessed using the MARCOS (Measurement of alternatives and ranking according to compromise solution) method as a multi-criteria decision-making method, highlight the superior accuracy of the MVMD-H-SKRR-GBO model in predicting ILI rates. The results clearly highlight the exceptional performance of the MVMD-H-SKRR-GBO model, with outstanding precision demonstrated by impressive R, RMSE, IA, and U95 % values of 0.946, 0.388, 0.970, and 1.075, respectively, at t + 7.
Collapse
Affiliation(s)
- Guo Hongliang
- College of Information Technology, Jilin Agricultural University, Changchun, 130118, China.
| | - Zhang Zhiyao
- College of Information Technology, Jilin Agricultural University, Changchun, 130118, China.
| | - Iman Ahmadianfar
- Information and Communication Technology Research Group, Scientific Research Center, Al-Ayen University, Thi-Qar, Nasiriyah, 64001, Iraq.
| | - José Escorcia-Gutierrez
- Department of Computational Science and Electronics, Universidad de La Costa, CUC, Barranquilla, 080002, Colombia.
| | - Nojood O Aljehane
- Faculty of Computers and Information Technology, University of Tabuk, Tabuk, Saudi Arabia, Tabuk University, KSA.
| | - Chengye Li
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| |
Collapse
|
12
|
Bahauddin M, Baltaci H, Onat B. The role of large-scale atmospheric circulations on long-term variations of PM 10 concentrations over Turkey. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024; 31:1260-1275. [PMID: 38038918 DOI: 10.1007/s11356-023-31164-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 11/17/2023] [Indexed: 12/02/2023]
Abstract
PM10 is widely identified as an important atmospheric pollutant posing a serious threat to human health and environment as well as it influences the climate system. To unearth the mechanism involved in its sources and circulation behavior in environment, this study focuses on the role of large-scale atmospheric circulation on the long-term variability of PM10 over Turkey by applying rotated empirical orthogonal functions (REOF) analysis. As a result of the implementation of REOF to the daily PM10 data for 80 air quality stations throughout the period 2010-2020, first REOF mode (REOF1 44.9% in winter, 43.2% in spring, 39.5% in summer and 31.6% in fall) for all the four seasons indicated the role of local emission sources on the variations of PM10, which show high PM10 values in different geographical regions. The results of the second mode (REOF2, 17.9% in winter, 14.0% in spring, 14.0% in summer and 16.3% in fall) indicate the role of large-scale atmospheric circulations on the values of PM10. From the REOF2 analysis and extracted synoptic composite maps, the strength of southerly winds and the presence of southwesterly winds at low levels are very important in transporting of dust pollutants from the Arabian Peninsula and Northern Africa, respectively, to the eastern (EAR) and southeastern (SEAR) regions of Turkey during winter. In spring, sand particles in the interior terrestrial part of the country are carried to the northern regions by the effect of large-scale southerly winds, which cause above-normal PM10 concentrations in the Black Sea region of Turkey. In summer, dust particles together with warm dry air intrusion to the eastern region of Turkey by strong easterly winds are sourced by Caspian Sea and result in high PM10 values. Our findings emphasize that the long-term variations in air quality over Turkey are affected secondary by the variations in the large-scale atmospheric circulations with primary contributions from the changes in local emission sources.
Collapse
Affiliation(s)
- Mir Bahauddin
- Environmental Engineering Department, Engineering Faculty, Istanbul University-Cerrahpasa, Avcılar, 34320, Istanbul, Turkey
| | - Hakki Baltaci
- Institute of Earth and Marine Sciences, Gebze Technical University, Gebze, Kocaeli, Turkey.
| | - Burcu Onat
- Environmental Engineering Department, Engineering Faculty, Istanbul University-Cerrahpasa, Avcılar, 34320, Istanbul, Turkey
| |
Collapse
|
13
|
Scott-Fordsmand JJ, Amorim MJB. Using Machine Learning to make nanomaterials sustainable. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 859:160303. [PMID: 36410486 DOI: 10.1016/j.scitotenv.2022.160303] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 11/06/2022] [Accepted: 11/15/2022] [Indexed: 06/16/2023]
Abstract
Sustainable development is a key challenge for contemporary human societies; failure to achieve sustainability could threaten human survival. In this review article, we illustrate how Machine Learning (ML) could support more sustainable development, covering the basics of data gathering through each step of the Environmental Risk Assessment (ERA). The literature provides several examples showing how ML can be employed in most steps of a typical ERA.A key observation is that there are currently no clear guidance for using such autonomous technologies in ERAs or which standards/checks are required. Steering thus seems to be the most important task for supporting the use of ML in the ERA of nano- and smart-materials. Resources should be devoted to developing a strategy for implementing ML in ERA with a strong emphasis on data foundations, methodologies, and the related sensitivities/uncertainties. We should recognise historical errors and biases (e.g., in data) to avoid embedding them during ML programming.
Collapse
Affiliation(s)
| | - Mónica J B Amorim
- Department of Biology & CESAM, University of Aveiro, 3810-193 Aveiro, Portugal.
| |
Collapse
|
14
|
Wai KM, Yu PKN. Application of a Machine Learning Method for Prediction of Urban Neighborhood-Scale Air Pollution. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:2412. [PMID: 36767778 PMCID: PMC9915966 DOI: 10.3390/ijerph20032412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 01/20/2023] [Accepted: 01/24/2023] [Indexed: 06/18/2023]
Abstract
Urban air pollution has aroused growing attention due to its associated adverse health effects. A model which could promptly predict urban air quality with considerable accuracy is, therefore, important and will benefit the development of smart cities. However, only a computational fluid dynamics (CFD) model could better resolve the dispersion behavior within an urban canyon layer. A machine learning (ML) model using the Artificial Neural Network (ANN) approach was formulated in the current study to investigate vehicle-derived airborne particulate (PM10) dispersion within a compact high-rise-built environment. Various measured meteorological parameters and PM10 concentrations were adopted as the model inputs to train the ANN model. A building-resolved CFD model under the same environmental settings was also set up to compare its model performance with the ANN model. Our results showed that the ANN model exhibited promising performance (r = 0.82, fractional bias = 0.002) when comparing the > 1000 h PM10 measurements. When comparing the diurnal hourly measured PM10 variations in a clear-sky day, both the ANN and CFD models performed well (r > 0.8). The good performance of the CFD model relied on the knowledge of the in situ diurnal traffic profile, the adoption of suitable mobile source emission factor(s) (e.g., from MOBILE 6 and COPERT4), and the use of urban thermal and dynamical variables to capture PM10 variations in both neutral and unstable atmospheric conditions. These requirements/constraints make it impractical for daily operation. On the contrary, the ML (ANN) model adopted here is free from these constraints and is fast (less than 0.1% computational time relative to the CFD model). These results demonstrate that the ANN model is a superior option for a smart city application.
Collapse
|
15
|
Yu X, Wang Q, Wei J, Zeng Q, Xiao L, Ni H, Xu T, Wu H, Guo P, Zhang X. Impacts of traffic-related particulate matter pollution on semen quality: A retrospective cohort study relying on the random forest model in a megacity of South China. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 851:158387. [PMID: 36049696 DOI: 10.1016/j.scitotenv.2022.158387] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 08/17/2022] [Accepted: 08/25/2022] [Indexed: 02/05/2023]
Abstract
BACKGROUND Emerging evidence shows the detrimental impacts of particulate matter (PM) on poor semen quality. High-resolution estimates of PM concentrations are conducive to evaluating accurate associations between traffic-related PM exposure and semen quality. METHODS In this study, we firstly developed a random forest model incorporating meteorological factors, land-use information, traffic-related variables, and other spatiotemporal predictors to estimate daily traffic-related PM concentrations, including PM2.5, PM10, and PM1. Then we enrolled 1310 semen donors corresponding to 4912 semen samples during the study period from January 1, 2019, and December 31, 2019 in Guangzhou city, China. Linear mixed models were employed to associate individual exposures to traffic-related PM during the entire (0-90 lag days) and key periods (0-37 and 34-77 lag days) with semen quality parameters, including sperm concentration, sperm count, progressive motility and total motility. RESULTS The results showed that decreased sperm concentration was associated with PM10 exposures (β: -0.21, 95 % CI: -0.35, -0.07), sperm count was inversely related to both PM2.5 (β: -0.19, 95 % CI: -0.35, -0.02) and PM10 (β: -0.19, 95 % CI: -0.33, -0.05) during the 0-90 days lag exposure window. Besides, PM2.5 and PM10 might diminish sperm concentration by mainly affecting the late phase of sperm development (0-37 lag days). Stratified analyses suggested that PBF and drinking seemed to modify the associations between PM exposure and sperm motility. We did not observe any significant associations of PM1 exposures with semen parameters. CONCLUSION Our results indicate that exposure to traffic-related PM2.5 and PM10 pollution throughout spermatogenesis may adversely affect semen quality, especially sperm concentration and count. The findings provided more evidence for the negative associations between traffic-related PM exposure and semen quality, highlighting the necessity to reduce ambient air pollution through environmental policy.
Collapse
Affiliation(s)
- Xiaolin Yu
- Department of Preventive Medicine, Shantou University Medical College, No. 22 Xinling Road, Shantou 515041, China
| | - Qiling Wang
- National Health Commission Key Laboratory of Male Reproduction and Genetics, Guangzhou, China; Department of Andrology, Guangdong Provincial Reproductive Science Institute (Guangdong Provincial Fertility Hospital), China
| | - Jing Wei
- State Key Laboratory of Remote Sensing Science, College of Global Change and Earth System Science, Beijing Normal University, Beijing, China; Department of Atmospheric and Oceanic Science, Earth System Science Interdisciplinary Center, University of Maryland, College Park, MD, USA
| | - Qinghui Zeng
- Department of Preventive Medicine, Shantou University Medical College, No. 22 Xinling Road, Shantou 515041, China
| | - Lina Xiao
- Department of Preventive Medicine, Shantou University Medical College, No. 22 Xinling Road, Shantou 515041, China
| | - Haobo Ni
- Department of Preventive Medicine, Shantou University Medical College, No. 22 Xinling Road, Shantou 515041, China
| | - Ting Xu
- Department of Preventive Medicine, Shantou University Medical College, No. 22 Xinling Road, Shantou 515041, China
| | - Haisheng Wu
- Department of Preventive Medicine, Shantou University Medical College, No. 22 Xinling Road, Shantou 515041, China
| | - Pi Guo
- Department of Preventive Medicine, Shantou University Medical College, No. 22 Xinling Road, Shantou 515041, China
- Guangdong Provincial Key Laboratory of Infectious Diseases and Molecular Immunopathology, Shantou 515041, China
| | - Xinzong Zhang
- National Health Commission Key Laboratory of Male Reproduction and Genetics, Guangzhou, China
- Department of Andrology, Guangdong Provincial Reproductive Science Institute (Guangdong Provincial Fertility Hospital), China
| |
Collapse
|
16
|
Peng J, Han H, Yi Y, Huang H, Xie L. Machine learning and deep learning modeling and simulation for predicting PM2.5 concentrations. CHEMOSPHERE 2022; 308:136353. [PMID: 36084831 DOI: 10.1016/j.chemosphere.2022.136353] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 08/14/2022] [Accepted: 09/02/2022] [Indexed: 06/15/2023]
Abstract
Particulate matter (PM) pollution greatly endanger human physical and mental health, and it is of great practical significance to predict PM concentrations accurately. This study measured one-year monitoring data of six main meteorological parameters and PM2.5 concentrations independently at two monitoring sites in central China's Hunan Province. These datasets were then employed to train, validate, and evaluate the proposed extreme gradient boosting (XGBoost) machine learning model and the fully connected neural network deep learning model, respectively. The performances of the two models were compared, analyzed, and optimized through model parameter tuning. The XGBoost model had better prediction ability with R2 higher than 0.761 in the complete test dataset. When the complete dataset was divided into stratified sub-sets by daytime-nighttime periods, the value of R2 increased to 0.856 in the nighttime test dataset. The feature importance and influential mechanism of meteorological variables on PM2.5 concentrations were analyzed and discussed.
Collapse
Affiliation(s)
- Jian Peng
- School of Minerals Processing and Bioengineering, Central South University, Changsha, 410083, China
| | - Haisheng Han
- School of Minerals Processing and Bioengineering, Central South University, Changsha, 410083, China
| | - Yong Yi
- Atmospheric Environment Monitoring Department, Changsha Environmental Monitoring Centre of Hunan Province, Changsha, 410001, China
| | - Huimin Huang
- Atmospheric Environment Monitoring Department, Changsha Environmental Monitoring Centre of Hunan Province, Changsha, 410001, China
| | - Le Xie
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, China.
| |
Collapse
|
17
|
Tella A, Balogun AL. GIS-based air quality modelling: spatial prediction of PM10 for Selangor State, Malaysia using machine learning algorithms. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:86109-86125. [PMID: 34533750 DOI: 10.1007/s11356-021-16150-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 08/20/2021] [Indexed: 06/13/2023]
Abstract
Rapid urbanization has caused severe deterioration of air quality globally, leading to increased hospitalization and premature deaths. Therefore, accurate prediction of air quality is crucial for mitigation planning to support urban sustainability and resilience. Although some studies have predicted air pollutants such as particulate matter (PM) using machine learning algorithms (MLAs), there is a paucity of studies on spatial hazard assessment with respect to the air quality index (AQI). Incorporating PM in AQI studies is crucial because of its easily inhalable micro-size which has adverse impacts on ecology, environment, and human health. Accurate and timely prediction of the air quality index can ensure adequate intervention to aid air quality management. Therefore, this study undertakes a spatial hazard assessment of the air quality index using particulate matter with a diameter of 10 μm or lesser (PM10) in Selangor, Malaysia, by developing four machine learning models: eXtreme Gradient Boosting (XGBoost), random forest (RF), K-nearest neighbour (KNN), and Naive Bayes (NB). Spatially processed data such as NDVI, SAVI, BU, LST, Ws, slope, elevation, and road density was used for the modelling. The model was trained with 70% of the dataset, while 30% was used for cross-validation. Results showed that XGBoost has the highest overall accuracy and precision of 0.989 and 0.995, followed by random forest (0.989, 0.993), K-nearest neighbour (0.987, 0.984), and Naive Bayes (0.917, 0.922), respectively. The spatial air quality maps were generated by integrating the geographical information system (GIS) with the four MLAs, which correlated with Malaysia's air pollution index. The maps indicate that air quality in Selangor is satisfactory and posed no threats to health. Nevertheless, the two algorithms with the best performance (XGBoost and RF) indicate that a high percentage of the air quality is moderate. The study concludes that successful air pollution management policies such as green infrastructure practice, improvement of energy efficiency, and restrictions on heavy-duty vehicles can be adopted in Selangor and other Southeast Asian cities to prevent deterioration of air quality in the future.
Collapse
Affiliation(s)
- Abdulwaheed Tella
- Geospatial Analysis and Modelling (GAM) Research Laboratory, Department of Civil and Environmental Engineering, Universiti Teknologi PETRONAS (UTP), 32610, Seri Iskandar, Perak, Malaysia.
| | - Abdul-Lateef Balogun
- Geospatial Analysis and Modelling (GAM) Research Laboratory, Department of Civil and Environmental Engineering, Universiti Teknologi PETRONAS (UTP), 32610, Seri Iskandar, Perak, Malaysia
| |
Collapse
|
18
|
Yu H, Xu T, Chen J, Yin W, Ye F. Association of inflammation and lung function decline caused by personal PM 2.5 exposure: a machine learning approach in time-series data. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:80436-80447. [PMID: 35716299 DOI: 10.1007/s11356-022-21457-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 06/10/2022] [Indexed: 06/15/2023]
Abstract
Numerous studies focused on the association between lung function impairment and inflammation caused by fine particulate matter (PM2.5), but the causal relationships are difficult to clarify. In the current study, twenty healthy Chinese young adults who participated in 7 days of observation every four seasons were enrolled, and autoregression models (AM) and classification and regression trees (CART) in a machine learning framework were applied to analyze the association among PM2.5 exposure, inflammation, and lung function from a data structure perspective. There were strong cross-correlations between personal dose of PM2.5 (Dw) and lung functions (vital capacity (VC), forced vital capacity (FVC), etc.). These cross-correlation coefficients were associated with inflammatory indicators (uteroglobin (UG), serum amyloid (SAA), and fractional exhaled nitric oxide (FeNO)). CART reported that inflammatory indicators UG and SAA had the predictive ability of the directional association between Dw and FVC at 1-day lag and that high levels of UG and SAA predicted that PM2.5 exposure induced lung function decline. Consistently, lower lung function indicators at a 2-day lag after personal PM2.5 exposure predicted the high value of inflammatory indicator FeNO. Taken together, we applied machine learning algorithms to analyze repeated measurement data, finding that inflammation and lung function decline caused by PM2.5 could affect each other.
Collapse
Affiliation(s)
- Hao Yu
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 440305, Guangdong, People's Republic of China
| | - Tian Xu
- Department of Occupational and Environmental Health, Key Laboratory of Environment & Health (Huazhong University of Science and Technology), Ministry of Education, State Environmental Protection Key Laboratory of Environment and Health (Wuhan), and State Key Laboratory of Environment Health (Incubation), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Hangkong Road 13, Wuhan, 430030, Hubei, People's Republic of China
| | - Juan Chen
- Department of Occupational and Environmental Health, Key Laboratory of Environment & Health (Huazhong University of Science and Technology), Ministry of Education, State Environmental Protection Key Laboratory of Environment and Health (Wuhan), and State Key Laboratory of Environment Health (Incubation), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Hangkong Road 13, Wuhan, 430030, Hubei, People's Republic of China
| | - Wenjun Yin
- Department of Occupational and Environmental Health, Key Laboratory of Environment & Health (Huazhong University of Science and Technology), Ministry of Education, State Environmental Protection Key Laboratory of Environment and Health (Wuhan), and State Key Laboratory of Environment Health (Incubation), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Hangkong Road 13, Wuhan, 430030, Hubei, People's Republic of China
| | - Fang Ye
- Department of Occupational and Environmental Health, Key Laboratory of Environment & Health (Huazhong University of Science and Technology), Ministry of Education, State Environmental Protection Key Laboratory of Environment and Health (Wuhan), and State Key Laboratory of Environment Health (Incubation), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Hangkong Road 13, Wuhan, 430030, Hubei, People's Republic of China.
| |
Collapse
|
19
|
Environmental Pollution Analysis and Impact Study-A Case Study for the Salton Sea in California. ATMOSPHERE 2022. [DOI: 10.3390/atmos13060914] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
A natural experiment conducted on the shrinking Salton Sea, a saline lake in California, showed that each one foot drop in lake elevation resulted in a 2.6% average increase in PM2.5 concentrations. The shrinking has caused the asthma rate continues to increase among children, with one in five children being sent to the emergency department, which is related to asthma. In this paper, several data-driven machine learning (ML) models are developed for forecasting air quality and dust emission to study, evaluate and predict the impacts on human health due to the shrinkage of the sea, such as the Salton Sea. The paper presents an improved long short-term memory (LSTM) model to predict the hourly air quality (O3 and CO) based on air pollutants and weather data in the previous 5 h. According to our experiment results, the model generates a very good R2 score of 0.924 and 0.835 for O3 and CO, respectively. In addition, the paper proposes an ensemble model based on random forest (RF) and gradient boosting (GBoost) algorithms for forecasting hourly PM2.5 and PM10 using the air quality and weather data in the previous 5 h. Furthermore, the paper shares our research results for PM2.5 and PM10 prediction based on the proposed ensemble ML models using satellite remote sensing data. Daily PM2.5 and PM10 concentration maps in 2018 are created to display the regional air pollution density and severity. Finally, the paper reports Artificial Intelligence (AI) based research findings of measuring air pollution impact on asthma prevalence rate of local residents in the Salton Sea region. A stacked ensemble model based on support vector regression (SVR), elastic net regression (ENR), RF and GBoost is developed for asthma prediction with a good R2 score of 0.978.
Collapse
|
20
|
Zhang P, Yang L, Ma W, Wang N, Wen F, Liu Q. Spatiotemporal estimation of the PM 2.5 concentration and human health risks combining the three-dimensional landscape pattern index and machine learning methods to optimize land use regression modeling in Shaanxi, China. ENVIRONMENTAL RESEARCH 2022; 208:112759. [PMID: 35077716 DOI: 10.1016/j.envres.2022.112759] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 01/05/2022] [Accepted: 01/16/2022] [Indexed: 06/14/2023]
Abstract
PM2.5 pollution endangers human health and urban sustainable development. Land use regression (LUR) is one of the most important methods to reveal the temporal and spatial heterogeneity of PM2.5, and the introduction of characteristic variables of geographical factors and the improvement of model construction methods are important research directions for its optimization. However, the complex non-linear correlation between PM2.5 and influencing indicators is always unrecognized by the traditional regression model. The two-dimensional landscape pattern index is difficult to reflect the real information of the surface, and the research accuracy cannot meet the requirements. As such, a novel integrated three-dimensional landscape pattern index (TDLPI) and machine learning extreme gradient boosting (XGBOOST) improved LUR model (LTX) are developed to estimate the spatiotemporal heterogeneity in the fine particle concentration in Shaanxi, China, and health risks of exposure and inhalation of PM2.5 were explored. The LTX model performed well with R2 = 0.88, RMSE of 8.73 μg/m3 and MAE of 5.85 μg/m3. Our findings suggest that integrated three-dimensional landscape pattern information and XGBOOST approaches can accurately estimate annual and seasonal variations of PM2.5 pollution The Guanzhong Plain and northern Shaanxi always feature high PM2.5 values, which exhibit similar distribution trends to those of the observed PM2.5 pollution. This study demonstrated the outstanding performance of the LTX model, which outperforms most models in past researches. On the whole, LTX approach is reliable and can improve the accuracy of pollutant concentration prediction. The health risks of human exposure to fine particles are relatively high in winter. Central part is a high health risk area, while northern area is low. Our study provides a new method for atmospheric pollutants assessing, which is important for LUR model optimization, high-precision PM2.5 pollution prediction and landscape pattern planning. These results can also contribute to human health exposure risks and future epidemiological studies of air pollution.
Collapse
Affiliation(s)
- Ping Zhang
- School of Environmental and Chemical Engineering, Xi'an Polytechnic University, Xi'an, 710048, China; Shaanxi Key Laboratory of Land Consolidation, Xi'an, 710075, China.
| | - Lianwei Yang
- School of Environmental and Chemical Engineering, Xi'an Polytechnic University, Xi'an, 710048, China
| | - Wenjie Ma
- School of Environmental and Chemical Engineering, Xi'an Polytechnic University, Xi'an, 710048, China
| | - Ning Wang
- School of Environmental and Chemical Engineering, Xi'an Polytechnic University, Xi'an, 710048, China
| | - Feng Wen
- School of Environmental and Chemical Engineering, Xi'an Polytechnic University, Xi'an, 710048, China.
| | - Qi Liu
- School of Environmental and Chemical Engineering, Xi'an Polytechnic University, Xi'an, 710048, China; The First Institute of Photogrammetry and Remote Sensing, MNR, Xi'an, 710054, China.
| |
Collapse
|
21
|
Huang J, Kwan MP, Cai J, Song W, Yu C, Kan Z, Yim SHL. Field Evaluation and Calibration of Low-Cost Air Pollution Sensors for Environmental Exposure Research. SENSORS 2022; 22:s22062381. [PMID: 35336552 PMCID: PMC8948698 DOI: 10.3390/s22062381] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Revised: 03/15/2022] [Accepted: 03/16/2022] [Indexed: 02/04/2023]
Abstract
This paper seeks to evaluate and calibrate data collected by low-cost particulate matter (PM) sensors in different environments and using different aggregated temporal units (i.e., 5-s, 1-min, 10-min, 30 min intervals). We first collected PM concentrations (i.e., PM1, PM2.5, and PM10) data in five different environments (i.e., indoor and outdoor of an office building, a train platform and lobby of a subway station, and a seaside location) in Hong Kong, using five AirBeam2 sensors as the low-cost sensors and a TSI DustTrak DRX Aerosol Monitor 8533 as the reference sensor. By comparing the collected PM concentrations, we found high linearity and correlation between the data reported by the AirBeam2 sensors in different environments. Furthermore, the results suggest that the accuracy and bias of the PM data reported by the AirBeam2 sensors are affected by rainy weather and environments with high humidity and a high level of hygroscopic salts (i.e., a seaside location). In addition, increasing the aggregation level of the temporal units (i.e., from 5-s to 30 min intervals) increases the correlation between the PM concentrations obtained by the AirBeam2 sensors, while it does not significantly improve the accuracy and bias of the data. Lastly, our results indicate that using a machine learning model (i.e., random forest) for the calibration of PM concentrations collected on sunny days generates better results than those obtained with multiple linear models. These findings have important implications for researchers when designing environmental exposure studies based on low-cost PM sensors.
Collapse
Affiliation(s)
- Jianwei Huang
- Institute of Space and Earth Information Science, The Chinese University of Hong Kong, Hong Kong, China; (J.H.); (J.C.); (W.S.); (C.Y.); (Z.K.)
| | - Mei-Po Kwan
- Institute of Space and Earth Information Science, The Chinese University of Hong Kong, Hong Kong, China; (J.H.); (J.C.); (W.S.); (C.Y.); (Z.K.)
- Department of Geography and Resource Management, The Chinese University of Hong Kong, Hong Kong, China
- Correspondence:
| | - Jiannan Cai
- Institute of Space and Earth Information Science, The Chinese University of Hong Kong, Hong Kong, China; (J.H.); (J.C.); (W.S.); (C.Y.); (Z.K.)
| | - Wanying Song
- Institute of Space and Earth Information Science, The Chinese University of Hong Kong, Hong Kong, China; (J.H.); (J.C.); (W.S.); (C.Y.); (Z.K.)
| | - Changda Yu
- Institute of Space and Earth Information Science, The Chinese University of Hong Kong, Hong Kong, China; (J.H.); (J.C.); (W.S.); (C.Y.); (Z.K.)
| | - Zihan Kan
- Institute of Space and Earth Information Science, The Chinese University of Hong Kong, Hong Kong, China; (J.H.); (J.C.); (W.S.); (C.Y.); (Z.K.)
| | - Steve Hung-Lam Yim
- Asian School of the Environment, Nanyang Technological University, Singapore 639798, Singapore;
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 639798, Singapore
- Earth Observatory of Singapore, Nanyang Technological University, Singapore 639798, Singapore
| |
Collapse
|
22
|
Liu X, Lu D, Zhang A, Liu Q, Jiang G. Data-Driven Machine Learning in Environmental Pollution: Gains and Problems. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:2124-2133. [PMID: 35084840 DOI: 10.1021/acs.est.1c06157] [Citation(s) in RCA: 151] [Impact Index Per Article: 50.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The complexity and dynamics of the environment make it extremely difficult to directly predict and trace the temporal and spatial changes in pollution. In the past decade, the unprecedented accumulation of data, the development of high-performance computing power, and the rise of diverse machine learning (ML) methods provide new opportunities for environmental pollution research. The ML methodology has been used in satellite data processing to obtain ground-level concentrations of atmospheric pollutants, pollution source apportionment, and spatial distribution modeling of water pollutants. However, unlike the active practices of ML in chemical toxicity prediction, advanced algorithms such as deep neural networks in environmental process studies of pollutants are still deficient. In addition, over 40% of the environmental applications of ML go to air pollution, and its application range and acceptance in other aspects of environmental science remain to be increased. The use of ML methods to revolutionize environmental science and its problem-solving scenarios has its own challenges. Several issues should be taken into consideration, such as the tradeoff between model performance and interpretability, prerequisites of the machine learning model, model selection, and data sharing.
Collapse
Affiliation(s)
- Xian Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
| | - Dawei Lu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
| | - Aiqian Zhang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, People's Republic of China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Institute of Environment and Health, Jianghan University, Wuhan 430056, People's Republic of China
| | - Qian Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Institute of Environment and Health, Jianghan University, Wuhan 430056, People's Republic of China
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, People's Republic of China
| |
Collapse
|
23
|
Xiong S, Peng Y, Lu S, Shang F, Li X, Yan J, Cen K. Generalized prediction and optimal operating parameters of PCDD/F emissions by explainable Bayesian support vector regression. WASTE MANAGEMENT (NEW YORK, N.Y.) 2021; 135:437-447. [PMID: 34619625 DOI: 10.1016/j.wasman.2021.09.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 08/23/2021] [Accepted: 09/19/2021] [Indexed: 06/13/2023]
Abstract
The current derived models for predicting polychlorinated dibenzo-p-dioxins and -furans (PCDD/F) emissions from incineration can only be applied to a specific incinerator due to high deviation or systematic errors. And the models fail to provide quantized guidance for the operation of full-scale municipal solid waste incinerators. To address the problem, explainable Bayesian support vector regression (E-BSVR) has been established to generalized predict and maximumly reduce the PCDD/F emissions. First, forty-two PCDD/F samples were determined from a whole year experiment in a full-scale incinerator. Meanwhile, 1,2,4-trichlorobenzene(1,2,4-TrCBz), carbon monoxide, sulfur dioxide, oxynitride, particulate matter, fluoride, and hydrogen chloride were measured, as input features. Second, after box-cox transformation normalization, and hyperparameters tuning, the R-Squared and root mean square error (RMSE) of the proposed method are 0.983 and 0.044, exhibiting high accuracy. The high accuracy (R-Squared = 0.992) and generalization are also proven on the dataset with high PCDD/F emissions. Then, the performances of BSVR are compared with kernel ridge regression, multiple linear regression, and unary linear regression, indicating afar smaller RMSE of BSVR. Finally, the optimal operating parameters are calculated through local interpretable model-agnostic explanations and the partial dependence plot. Results indicate that reducing the content of organic chlorine in municipal solid waste and inhibiting the deacon reaction are important methods for reducing PCDD/F emissions. The optimal operating parameters for the maximal reduction of PCDD/F emissions are 1,2,4-TrCBz < 0.098 ug/m3, fluoride > 0.452 mg/m3. As a whole, the E-BSVR method can be used as a reliable and accurate approach for the prediction and reduction of PCDD/F emissions.
Collapse
Affiliation(s)
- Shijian Xiong
- State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, PR China
| | - Yaqi Peng
- State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, PR China
| | - Shengyong Lu
- State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, PR China.
| | - Fanjie Shang
- Zhejiang Fuchunjiang Environmental Technology Research Co., Ltd., Hangzhou 311401, PR China
| | - Xiaodong Li
- State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, PR China
| | - Jianhua Yan
- State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, PR China
| | - Kefa Cen
- State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, PR China
| |
Collapse
|
24
|
Abstract
Air pollution and its consequences are negatively impacting on the world population and the environment, which converts the monitoring and forecasting air quality techniques as essential tools to combat this problem. To predict air quality with maximum accuracy, along with the implemented models and the quantity of the data, it is crucial also to consider the dataset types. This study selected a set of research works in the field of air quality prediction and is concentrated on the exploration of the datasets utilised in them. The most significant findings of this research work are: (1) meteorological datasets were used in 94.6% of the papers leaving behind the rest of the datasets with a big difference, which is complemented with others, such as temporal data, spatial data, and so on; (2) the usage of various datasets combinations has been commenced since 2009; and (3) the utilisation of open data have been started since 2012, 32.3% of the studies used open data, and 63.4% of the studies did not provide the data.
Collapse
|
25
|
Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM 2.5 Components. ATMOSPHERE 2020; 11. [PMID: 34322279 PMCID: PMC8315111 DOI: 10.3390/atmos11111233] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Bayesian additive regression tree (BART) is a recent statistical method that combines ensemble learning and nonparametric regression. BART is constructed under a probabilistic framework that also allows for model-based prediction uncertainty quantification. We evaluated the application of BART in predicting daily concentrations of four fine particulate matter (PM2.5) components (elemental carbon, organic carbon, nitrate, and sulfate) in California during the period 2005 to 2014. We demonstrate in this paper how BART can be tuned to optimize prediction performance and how to evaluate variable importance. Our BART models included, as predictors, a large suite of land-use variables, meteorological conditions, satellite-derived aerosol optical depth parameters, and simulations from a chemical transport model. In cross-validation experiments, BART demonstrated good out-of-sample prediction performance at monitoring locations (R2 from 0.62 to 0.73). More importantly, prediction intervals associated with concentration estimates from BART showed good coverage probability at locations with and without monitoring data. In our case study, major PM2.5 components could be estimated with good accuracy, especially when collocated PM2.5 total mass observations were available. In conclusion, BART is an attractive approach for modeling ambient air pollution levels, especially for its ability to provide uncertainty in estimates that may be useful for subsequent health impact and health effect analyses.
Collapse
|