1
|
Shi H, Yang X, Tang H, Tu Y. Temporally boosting neural network for improving dynamic prediction of PM 2.5 concentration with changing and unbalanced distribution. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2025; 383:125371. [PMID: 40267806 DOI: 10.1016/j.jenvman.2025.125371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 03/17/2025] [Accepted: 04/12/2025] [Indexed: 04/25/2025]
Abstract
Increasing medical research evidence suggests that even low PM2.5 concentrations may trigger significant health issues. Hence, an accurate prediction of PM2.5 holds immense significance in securing public health safety. However, current data-drive predictive methods exhibit seasonal model performance decline and difficulties in predicting extremely high values. Those issues may stem from neglecting two crucial features in PM2.5 data streams, i.e., concept drift and imbalanced distribution. In this study, we validate this hypothesis by conducting an in-depth analysis of the characteristics of the PM2.5 data stream and the prediction errors of three mainstream models trained on this PM2.5 data stream, i.e., random forest, convolutional neural network and transformer. Based on the identified types of concept drift and the patterns of imbalanced distribution, we introduce the Temporally boosting neural network (Temp-boost), a novel ensemble learning method designed to enhance predictive accuracy by integrating static and dynamic models. Static models, which are trained on balanced historical datasets, typically receive infrequent updates. Conversely, dynamic models are trained on newly arrived data and undergo more frequent updates. We evaluated the performance of Temp-boost and the three mentioned models in predicting gridded PM2.5 concentrations across the North China Plain in 2019. Compared to the three models, the Temp-boost shows improved prediction accuracy for different seasons, with notable enhancements in high-pollution levels. Specifically, for pollution levels above lightly polluted, the Temp-boost effectively reduces the average MAE by 13.22 μgm-3, RMSE by 13.32 μgm-3 , with reductions peaking MAE at 26.45 μgm-3,RMSE at 25.76 μgm-3 in more severe case.
Collapse
Affiliation(s)
- Haoze Shi
- State Key Laboratory of Remote Sensing Science, Faculty of Geographical Science, Beijing Normal University, Beijing, 100875, PR China.
| | - Xin Yang
- State Key Laboratory of Remote Sensing Science, Faculty of Geographical Science, Beijing Normal University, Beijing, 100875, PR China.
| | - Hong Tang
- State Key Laboratory of Remote Sensing Science, Faculty of Geographical Science, Beijing Normal University, Beijing, 100875, PR China.
| | - Yuhong Tu
- State Key Laboratory of Remote Sensing Science, Faculty of Geographical Science, Beijing Normal University, Beijing, 100875, PR China.
| |
Collapse
|
2
|
Wei Q, Chen Y, Zhang H, Jia Z, Yang J, Niu B. Simulation and prediction of PM2.5 concentrations and analysis of driving factors using interpretable tree-based models in Shanghai, China. ENVIRONMENTAL RESEARCH 2025; 270:121003. [PMID: 39894148 DOI: 10.1016/j.envres.2025.121003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2024] [Revised: 01/18/2025] [Accepted: 01/28/2025] [Indexed: 02/04/2025]
Abstract
PM2.5 is a critical air pollutant, and understanding its drivers is essential for regional air quality control. This study employed meteorological and pollutant variables to predict PM2.5 concentrations in Shanghai using interpretable tree-based models. The random forest (RF) model performed best, achieving MAE, RMSE, MBE, and R2 values of 3.279, 4.609, 1.254, and 0.971, respectively, improving accuracy by 42.1%-85.5% compared to AdaBoost. Shapley additive explanations (SHAP) analysis identified CO, SO2, and O3 as the most influential factors. Partial dependence plots (PDPs) showed SO2 had the strongest impact below 40 μg/m³, while NO2 exhibited a linear positive correlation with PM2.5 up to 60 μg/m³. Atmospheric pressure and rainfall were negatively correlated with PM2.5, with notable reductions in concentrations under high-pressure conditions and rainfall levels between 0 and 20 mm. Temperature and relative humidity showed complex relationships, with sharp increases in PM2.5 at temperatures between -5 °C and 15 °C and SHAP values declining for humidity above 90%. Wind speed exhibited a non-linear effect, with minimal influence at higher velocities. The combined effects of different pollutants can be intensified significantly at higher levels. These findings offer valuable guidance for urban air quality management and pollution mitigation strategies.
Collapse
Affiliation(s)
- Qing Wei
- College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; State Key Laboratory of Pollution Control and Resource Utilization, Tongji University, Shanghai 200092, China.
| | - Yongqi Chen
- College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; State Key Laboratory of Pollution Control and Resource Utilization, Tongji University, Shanghai 200092, China
| | - Huijin Zhang
- College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; State Key Laboratory of Pollution Control and Resource Utilization, Tongji University, Shanghai 200092, China
| | - Zichen Jia
- College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; State Key Laboratory of Pollution Control and Resource Utilization, Tongji University, Shanghai 200092, China
| | - Ju Yang
- Guangdong Institute of Water Resources and Hydropower Research, Guangzhou 510000, China
| | - Bin Niu
- PowerChina East China Survey, Design and Research Institute Co., Ltd, Hangzhou 310000, China
| |
Collapse
|
3
|
Guyu Z, Xiaoyuan Y, Jiansen S, Hongdou H, Qian W. A PM 2.5 spatiotemporal prediction model based on mixed graph convolutional GRU and self-attention network. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2025; 368:125748. [PMID: 39929428 DOI: 10.1016/j.envpol.2025.125748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 01/05/2025] [Accepted: 01/24/2025] [Indexed: 02/14/2025]
Abstract
The increase in atmospheric pollution has made it essential to develop accurate models for predicting pollutant concentrations. The current researches have faced challenges such as the neglect of significant information selection from local and neighboring stations, as well as insufficient attention to long-term historical data patterns. Therefore, this paper proposes a spatiotemporal prediction model called MGCGRU-SAN, which leverages long-term historical data to predict PM2.5 concentration values across multiple stations and multiple time steps in the future. Firstly, we employ the Mixed Graph Convolutional GRU(MGCGRU) module to capture the spatiotemporal dependencies in short-term historical time series from various stations. Secondly, the long-term PM2.5 historical time series (e.g. one week) is divided into uniformly sized segments and fed into the Self-Attention Network(SAN) module to capture the long-term potential temporal patterns. These enable the model to not only capture short-term fluctuations, but also identify and track long-term temporal patterns and trends in the prediction process. Finally, we conduct extensive comparative and ablation experiments using historical air pollutant and meteorological data from the Beijing-Tianjin-Hebei region. The experimental results demonstrate that the model, after capturing the long-term latent temporal patterns, achieve improvements of 9.62%, 6.33%, and 4.98% in the RSE, MAE, and RMSE evaluation metrics during multi-step prediction. Overall, the model outperforms the best baseline model by an average of 8.34%, 6.12%,4.06%, and 2.60% in RSE, MAE, RMSE, and Correlation metrics, respectively, showing superior performance in multi-station long-term predictions.
Collapse
Affiliation(s)
- Zhao Guyu
- School of Information Science and Engineering, Yanshan University, Qinhuangdao, 066000, Hebei, China
| | - Yang Xiaoyuan
- School of Information Science and Engineering, Yanshan University, Qinhuangdao, 066000, Hebei, China
| | - Shi Jiansen
- School of Information Science and Engineering, Yanshan University, Qinhuangdao, 066000, Hebei, China
| | - He Hongdou
- School of Information Science and Engineering, Yanshan University, Qinhuangdao, 066000, Hebei, China.
| | - Wang Qian
- School of Information Science and Engineering, Yanshan University, Qinhuangdao, 066000, Hebei, China
| |
Collapse
|
4
|
Lin TC, Chiueh PT, Hsiao TC. Challenges in Observation of Ultrafine Particles: Addressing Estimation Miscalculations and the Necessity of Temporal Trends. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2025; 59:565-577. [PMID: 39670560 PMCID: PMC11741106 DOI: 10.1021/acs.est.4c07460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Revised: 11/29/2024] [Accepted: 12/02/2024] [Indexed: 12/14/2024]
Abstract
Ultrafine particles (UFPs) pose a significant health risk, making comprehensive assessment essential. The influence of emission sources on particle concentrations is not only constrained by meteorological conditions but often intertwined with them, making it challenging to separate these effects. This study utilized valuable long-term particle number and size distribution (PNSD) data from 2018 to 2023 to develop a tree-based machine learning model enhanced with an interpretable component, incorporating temporal markers to characterize background or time series residuals. Our results demonstrated that, differing from PM2.5, which is significantly shaped by planetary boundary layer height, wind speed plays a crucial role in determining the particle number concentration (PNC), showing strong regional specificity. Furthermore, we systematically identified and analyzed anthropogenically influenced periodic trends. Notably, while Aitken mode observations are initially linked to traffic-related peaks, both Aitken and nucleation modes contribute to concentration peaks during rush hour periods on short-term impacts after deweather adjustment. Pollutant baseline concentrations are largely driven by human activities, with meteorological factors modulating their variability, and the secondary formation of UFPs is likely reflected in temporal residuals. This study provides a flexible framework for isolating meteorological effects, allowing more accurate assessment of anthropogenic impacts and targeted management strategies for UFP and PNC.
Collapse
Affiliation(s)
- Tzu-Chi Lin
- Graduate
Institute of Environmental Engineering, College of Engineering, National Taiwan University, 71 Chou-Shan Road, Taipei 106, Taiwan
| | - Pei-Te Chiueh
- Graduate
Institute of Environmental Engineering, College of Engineering, National Taiwan University, 71 Chou-Shan Road, Taipei 106, Taiwan
| | - Ta-Chih Hsiao
- Graduate
Institute of Environmental Engineering, College of Engineering, National Taiwan University, 71 Chou-Shan Road, Taipei 106, Taiwan
- Research
Center for Environmental Changes, Academia
Sinica, Taipei 115, Taiwan
| |
Collapse
|
5
|
Xia Y, McCracken T, Liu T, Chen P, Metcalf A, Fan C. Understanding the Disparities of PM2.5 Air Pollution in Urban Areas via Deep Support Vector Regression. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:8404-8416. [PMID: 38698567 DOI: 10.1021/acs.est.3c09177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
In densely populated urban areas, PM2.5 has a direct impact on the health and quality of residents' life. Thus, understanding the disparities of PM2.5 is crucial for ensuring urban sustainability and public health. Traditional prediction models often overlook the spillover effects within urban areas and the complexity of the data, leading to inaccurate spatial predictions of PM2.5. We propose Deep Support Vector Regression (DSVR) that models the urban areas as a graph, with grid center points as the nodes and the connections between grids as the edges. Nature and human activity features of each grid are initialized as the representation of each node. Based on the graph, DSVR uses random diffusion-based deep learning to quantify the spillover effects of PM2.5. It leverages random walk to uncover more extensive spillover relationships between nodes, thereby capturing both the local and nonlocal spillover effects of PM2.5. And then it engages in predictive learning using the feature vectors that encapsulate spillover effects, enhancing the understanding of PM2.5 disparities and connections across different regions. By applying our proposed model in the northern region of New York for predictive performance analysis, we found that DSVR consistently outperforms other models. During periods of PM2.5 surges, the R-square of DSVR reaches as high as 0.729, outperforming non-spillover models by 2.5 to 5.7 times and traditional spatial metric models by 2.2 to 4.6 times. Therefore, our proposed model holds significant importance for understanding disparities of PM2.5 air pollution in urban areas, taking the first steps toward a new method that considers both the spillover effects and nonlinear feature of data for prediction.
Collapse
Affiliation(s)
- Yuling Xia
- School of Mathematics, Southwest Jiaotong University, Sichuan province Chengdu 611756, China
| | - Teague McCracken
- School of Civil and Environmental Engineering and Earth Sciences, Clemson University, Clemson, South Carolina 29634, United States
| | - Tong Liu
- School of Civil and Environmental Engineering and Earth Sciences, Clemson University, Clemson, South Carolina 29634, United States
| | - Pei Chen
- Department of Computer Science and Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Andrew Metcalf
- School of Civil and Environmental Engineering and Earth Sciences, Clemson University, Clemson, South Carolina 29634, United States
| | - Chao Fan
- School of Civil and Environmental Engineering and Earth Sciences, Clemson University, Clemson, South Carolina 29634, United States
| |
Collapse
|
6
|
Wang H, Zhang L, Wu R, Cen Y. Spatio-temporal fusion of meteorological factors for multi-site PM2.5 prediction: A deep learning and time-variant graph approach. ENVIRONMENTAL RESEARCH 2023; 239:117286. [PMID: 37797668 DOI: 10.1016/j.envres.2023.117286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 09/29/2023] [Accepted: 09/30/2023] [Indexed: 10/07/2023]
Abstract
In the field of environmental science, traditional methods for predicting PM2.5 concentrations primarily focus on singular temporal or spatial dimensions. This approach presents certain limitations when it comes to deeply mining the joint influence of multiple monitoring sites and their inherent connections with meteorological factors. To address this issue, we introduce an innovative deep-learning-based multi-graph model using Beijing as the study case. This model consists of two key modules: firstly, the 'Meteorological Factor Spatio-Temporal Feature Extraction Module'. This module deeply integrates spatio-temporal features of hourly meteorological data by employing Graph Convolutional Networks (GCN) and Long Short-Term Memory (LSTM) for spatial and temporal encoding respectively. Subsequently, through an attention mechanism, it retrieves a feature tensor associated with air pollutants. Secondly, these features are amalgamated with PM2.5 concentration values, allowing the 'PM2.5 Concentration Prediction Module' to predict with enhanced accuracy the joint influence across multiple monitoring sites. Our model exhibits significant advantages over traditional methods in processing the joint impact of multiple sites and their associated meteorological factors. By providing new perspectives and tools for the in-depth understanding of urban air pollutant distribution and optimization of air quality management, this model propels us towards a more comprehensive approach in tackling air pollution issues.
Collapse
Affiliation(s)
- Hongqing Wang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, 100094, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Lifu Zhang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, 100094, China.
| | - Rong Wu
- Department of Mathematical Sciences, Tsinghua University, Beijing, 100084, China.
| | - Yi Cen
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, 100094, China.
| |
Collapse
|
7
|
A hybrid deep learning framework for air quality prediction with spatial autocorrelation during the COVID-19 pandemic. Sci Rep 2023; 13:1015. [PMID: 36653488 PMCID: PMC9848720 DOI: 10.1038/s41598-023-28287-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 01/16/2023] [Indexed: 01/20/2023] Open
Abstract
China implemented a strict lockdown policy to prevent the spread of COVID-19 in the worst-affected regions, including Wuhan and Shanghai. This study aims to investigate impact of these lockdowns on air quality index (AQI) using a deep learning framework. In addition to historical pollutant concentrations and meteorological factors, we incorporate social and spatio-temporal influences in the framework. In particular, spatial autocorrelation (SAC), which combines temporal autocorrelation with spatial correlation, is adopted to reflect the influence of neighbouring cities and historical data. Our deep learning analysis obtained the estimates of the lockdown effects as - 25.88 in Wuhan and - 20.47 in Shanghai. The corresponding prediction errors are reduced by about 47% for Wuhan and by 67% for Shanghai, which enables much more reliable AQI forecasts for both cities.
Collapse
|