1
|
Fung PL, Savadkoohi M, Zaidan MA, Niemi JV, Timonen H, Pandolfi M, Alastuey A, Querol X, Hussein T, Petäjä T. Constructing transferable and interpretable machine learning models for black carbon concentrations. ENVIRONMENT INTERNATIONAL 2024; 184:108449. [PMID: 38286044 DOI: 10.1016/j.envint.2024.108449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 01/12/2024] [Accepted: 01/17/2024] [Indexed: 01/31/2024]
Abstract
Black carbon (BC) has received increasing attention from researchers due to its adverse health effects. However, in-situ BC measurements are often not included as a regulated variable in air quality monitoring networks. Machine learning (ML) models have been studied extensively to serve as virtual sensors to complement the reference instruments. This study evaluates and compares three white-box (WB) and four black-box (BB) ML models to estimate BC concentrations, with the focus to show their transferability and interpretability. We train the models with the long-term air pollutant and weather measurements in Barcelona urban background site, and test them in other European urban and traffic sites. Despite the difference in geographical locations and measurement sites, BC correlates the strongest with particle number concentration of accumulation mode (PNacc, r = 0.73-0.85) and nitrogen dioxide (NO2, r = 0.68-0.85) and the weakest with meteorological parameters. Due to its similarity of correlation behaviour, the ML models trained in Barcelona performs prominently at the traffic site in Helsinki (R2 = 0.80-0.86; mean absolute error MAE = 3.90-4.73 %) and at the urban background site in Dresden (R2 = 0.79-0.84; MAE = 4.23-4.82 %). WB models appear to explain less variability of BC than BB models, long short-term memory (LSTM) model of which outperforms the rest of the models. In terms of interpretability, we adopt several methods for individual model to quantify and normalize the relative importance of each input feature. The overall static relative importance commonly used for WB models demonstrate varying results from the dynamic values utilized to show local contribution used for BB models. PNacc and NO2 on average have the strongest absolute static contribution; however, they simultaneously impact the estimation positively and negatively at different sites. This comprehensive analysis demonstrates that the possibility of these interpretable air pollutant ML models to be transfered across space and time.
Collapse
Affiliation(s)
- Pak Lun Fung
- Institute for Atmospheric and Earth System Research / Physics, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland; Helsinki Institute of Sustainability Science, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland.
| | - Marjan Savadkoohi
- Institute of Environmental Assessment and Water Research (IDAEA-CSIC), Barcelona, Spain; Department of Mining, Industrial and ICT Engineering (EMIT), Manresa School of Engineering (EPSEM), Universitat Politècnica de Catalunya (UPC), Manresa 08242, Spain.
| | - Martha Arbayani Zaidan
- Institute for Atmospheric and Earth System Research / Physics, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland; Helsinki Institute of Sustainability Science, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland; Department of Computer Science, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland.
| | - Jarkko V Niemi
- Helsinki Region Environmental Services Authority (HSY), Helsinki FI-00066, Finland.
| | - Hilkka Timonen
- Atmospheric Composition Research, Finnish Meteorological Institute, Helsinki FI-00560, Finland.
| | - Marco Pandolfi
- Institute of Environmental Assessment and Water Research (IDAEA-CSIC), Barcelona, Spain.
| | - Andrés Alastuey
- Institute of Environmental Assessment and Water Research (IDAEA-CSIC), Barcelona, Spain.
| | - Xavier Querol
- Institute of Environmental Assessment and Water Research (IDAEA-CSIC), Barcelona, Spain.
| | - Tareq Hussein
- Institute for Atmospheric and Earth System Research / Physics, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland; Environmental and Atmospheric Research Laboratory (EARL), Department of Physics, School of Science, Amman 11942, Jordan.
| | - Tuukka Petäjä
- Institute for Atmospheric and Earth System Research / Physics, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland.
| |
Collapse
|
2
|
Ma X, Zou B, Deng J, Gao J, Longley I, Xiao S, Guo B, Wu Y, Xu T, Xu X, Yang X, Wang X, Tan Z, Wang Y, Morawska L, Salmond J. A comprehensive review of the development of land use regression approaches for modeling spatiotemporal variations of ambient air pollution: A perspective from 2011 to 2023. ENVIRONMENT INTERNATIONAL 2024; 183:108430. [PMID: 38219544 DOI: 10.1016/j.envint.2024.108430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 11/26/2023] [Accepted: 01/04/2024] [Indexed: 01/16/2024]
Abstract
Land use regression (LUR) models are widely used in epidemiological and environmental studies to estimate humans' exposure to air pollution within urban areas. However, the early models, developed using linear regressions and data from fixed monitoring stations and passive sampling, were primarily designed to model traditional and criteria air pollutants and had limitations in capturing high-resolution spatiotemporal variations of air pollution. Over the past decade, there has been a notable development of multi-source observations from low-cost monitors, mobile monitoring, and satellites, in conjunction with the integration of advanced statistical methods and spatially and temporally dynamic predictors, which have facilitated significant expansion and advancement of LUR approaches. This paper reviews and synthesizes the recent advances in LUR approaches from the perspectives of the changes in air quality data acquisition, novel predictor variables, advances in model-developing approaches, improvements in validation methods, model transferability, and modeling software as reported in 155 LUR studies published between 2011 and 2023. We demonstrate that these developments have enabled LUR models to be developed for larger study areas and encompass a wider range of criteria and unregulated air pollutants. LUR models in the conventional spatial structure have been complemented by more complex spatiotemporal structures. Compared with linear models, advanced statistical methods yield better predictions when handling data with complex relationships and interactions. Finally, this study explores new developments, identifies potential pathways for further breakthroughs in LUR methodologies, and proposes future research directions. In this context, LUR approaches have the potential to make a significant contribution to future efforts to model the patterns of long- and short-term exposure of urban populations to air pollution.
Collapse
Affiliation(s)
- Xuying Ma
- College of Geomatics, Xi'an University of Science and Technology, Xi'an 710054, China; College of Safety Science and Engineering, Xi'an University of Science and Technology, Xi'an 710054, China; International Laboratory for Air Quality and Health, Queensland University of Technology, Brisbane, Queensland 4000, Australia.
| | - Bin Zou
- School of Geosciences and Info-Physics, Central South University, Changsha, Hunan 410083, China.
| | - Jun Deng
- College of Safety Science and Engineering, Xi'an University of Science and Technology, Xi'an 710054, China; Shaanxi Key Laboratory of Prevention and Control of Coal Fire, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Jay Gao
- School of Environment, Faculty of Science, University of Auckland, Auckland 1010, New Zealand
| | - Ian Longley
- National Institute of Water and Atmospheric Research, Auckland 1010, New Zealand
| | - Shun Xiao
- School of Geography and Tourism, Shaanxi Normal University, Xi'an 710119, China
| | - Bin Guo
- College of Geomatics, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Yarui Wu
- College of Geomatics, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Tingting Xu
- School of Software Engineering, Chongqing University of Post and Telecommunications, Chongqing 400065, China
| | - Xin Xu
- Xi'an Institute for Innovative Earth Environment Research, Xi'an 710061, China
| | - Xiaosha Yang
- Shandong Nova Fitness Co., Ltd., Baoji, Shaanxi 722404, China
| | - Xiaoqi Wang
- College of Geomatics, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Zelei Tan
- College of Geomatics, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Yifan Wang
- College of Geomatics, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Lidia Morawska
- International Laboratory for Air Quality and Health, Queensland University of Technology, Brisbane, Queensland 4000, Australia.
| | - Jennifer Salmond
- School of Environment, Faculty of Science, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
3
|
Mamić L, Gašparović M, Kaplan G. Developing PM 2.5 and PM 10 prediction models on a national and regional scale using open-source remote sensing data. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 195:644. [PMID: 37149506 PMCID: PMC10164030 DOI: 10.1007/s10661-023-11212-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 04/03/2023] [Indexed: 05/08/2023]
Abstract
Clean air is the precursor to a healthy life. Air quality is an issue that has been getting under its well-deserved spotlight in the last few years. From a remote sensing point of view, the first Copernicus mission with the main purpose of monitoring the atmosphere and tracking air pollutants, the Sentinel-5P TROPOMI mission, has been widely used worldwide. Particulate matter of a diameter smaller than 2.5 and 10 μm (PM2.5 and PM10) significantly determines air quality. Still, there are no available satellite sensors that allow us to track them remotely with high accuracy, but only using ground stations. This research aims to estimate PM2.5 and PM10 using Sentinel-5P and other open-source remote sensing data available on the Google Earth Engine (GEE) platform for heating (December 2021, January, and February 2022) and non-heating seasons (June, July, and August 2021) on the territory of the Republic of Croatia. Ground stations of the National Network for Continuous Air Quality Monitoring were used as a starting point and as ground truth data. Raw hourly data were matched to remote sensing data, and seasonal models were trained at the national and regional scale using machine learning. The proposed approach uses a random forest algorithm with a percentage split of 70% and gives moderate to high accuracy regarding the temporal frame of the data. The mapping gives us visual insight between the ground and remote sensing data and shows the seasonal variations of PM2.5 and PM10. The results showed that the proposed approach and models could efficiently estimate air quality.
Collapse
Affiliation(s)
- Luka Mamić
- Department of Civil, Building and Environmental Engineering, Sapienza University of Rome, Rome, Italy.
- Department of Land, Environment, Agriculture and Forestry (TESAF), University of Padua, Padova, Italy.
| | - Mateo Gašparović
- Chair of Photogrammetry and Remote Sensing, Faculty of Geodesy, University of Zagreb, Zagreb, Croatia
| | - Gordana Kaplan
- Institute of Earth and Space Sciences, Eskisehir Technical University, Eskisehir, Turkey
| |
Collapse
|
4
|
|
5
|
Chen PC, Lin YT. Exposure assessment of PM 2.5 using smart spatial interpolation on regulatory air quality stations with clustering of densely-deployed microsensors. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2022; 292:118401. [PMID: 34695517 DOI: 10.1016/j.envpol.2021.118401] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 10/20/2021] [Accepted: 10/21/2021] [Indexed: 06/13/2023]
Abstract
Accurate mapping of air pollutants is essential for epidemiological studies and environmental risk assessments. Concentrations measured by air quality monitoring stations (AQMS) have primarily been used to assess the exposure of PM2.5. However, the low coverage and amount of monitoring stations affect the errors of spatial interpolation or geostatistical estimates. In contrast to other integrated approaches developed for improved air pollution estimates, this study utilizes data from low-cost microsensors densely deployed in Taiwan to improve the popular spatial interpolation approach called inverse distance weighting (IDW). A large dataset from thousands of low-cost sensors could improve spatial interpolation by describing the distribution of PM2.5 in detail. Therefore, this study presents a clustering-based method to assess the distribution of PM2.5. Then, a smarter IDW is performed based on correlated observations from the selected air quality stations. The publicly available data chosen for this investigation pertained to Taiwan, which has deployed 74 monitoring stations and more than 11,000 low-cost sensors since December 2020. The results of leave-one-out cross-validation indicate that there are fewer PM2.5 estimation errors in the developed approach than in estimations that use kriging across almost all of the months and sampled dates of 2019 and 2020, particularly those with higher PM2.5 spatial heterogeneities. Spatial heterogeneities could result in more significant estimation errors in mainstream approaches. The root mean square error of the monthly average estimate for PM2.5 ranged from 1.17 to 3.86 μg/m3. We also found that the clustering of one month characterizing the pattern of PM2.5 distribution could perform well in spatial interpolations based on historical data from monitoring stations. According to the information on the openaq platform, low-cost sensors are in demand in cities and areas. This trend might pave the way for the application of the proposed approach in other areas for superior exposure assessments.
Collapse
Affiliation(s)
- Pi-Cheng Chen
- Department of Environmental Engineering, National Cheng Kung University, Taiwan.
| | - Yu-Ting Lin
- Department of Environmental Engineering, National Cheng Kung University, Taiwan
| |
Collapse
|
6
|
Artificial Neural Networks, Sequence-to-Sequence LSTMs, and Exogenous Variables as Analytical Tools for NO 2 (Air Pollution) Forecasting: A Case Study in the Bay of Algeciras (Spain). SENSORS 2021; 21:s21051770. [PMID: 33806409 PMCID: PMC7961900 DOI: 10.3390/s21051770] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 02/22/2021] [Accepted: 02/28/2021] [Indexed: 11/17/2022]
Abstract
This study aims to produce accurate predictions of the NO2 concentrations at a specific station of a monitoring network located in the Bay of Algeciras (Spain). Artificial neural networks (ANNs) and sequence-to-sequence long short-term memory networks (LSTMs) were used to create the forecasting models. Additionally, a new prediction method was proposed combining LSTMs using a rolling window scheme with a cross-validation procedure for time series (LSTM-CVT). Two different strategies were followed regarding the input variables: using NO2 from the station or employing NO2 and other pollutants data from any station of the network plus meteorological variables. The ANN and LSTM-CVT exogenous models used lagged datasets of different window sizes. Several feature ranking methods were used to select the top lagged variables and include them in the final exogenous datasets. Prediction horizons of t + 1, t + 4 and t + 8 were employed. The exogenous variables inclusion enhanced the model's performance, especially for t + 4 (ρ ≈ 0.68 to ρ ≈ 0.74) and t + 8 (ρ ≈ 0.59 to ρ ≈ 0.66). The proposed LSTM-CVT method delivered promising results as the best performing models per prediction horizon employed this new methodology. Additionally, per each parameter combination, it obtained lower error values than ANNs in 85% of the cases.
Collapse
|
7
|
A permutation entropy-based EMD–ANN forecasting ensemble approach for wind speed prediction. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05141-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
8
|
Zaidan MA, Surakhi O, Fung PL, Hussein T. Sensitivity Analysis for Predicting Sub-Micron Aerosol Concentrations Based on Meteorological Parameters. SENSORS (BASEL, SWITZERLAND) 2020; 20:E2876. [PMID: 32438603 PMCID: PMC7285010 DOI: 10.3390/s20102876] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 05/12/2020] [Accepted: 05/15/2020] [Indexed: 11/16/2022]
Abstract
Sub-micron aerosols are a vital air pollutant to be measured because they pose health effects. These particles are quantified as particle number concentration (PN). However, PN measurements are not always available in air quality measurement stations, leading to data scarcity. In order to compensate this, PN modeling needs to be developed. This paper presents a PN modeling framework using sensitivity analysis tested on a one year aerosol measurement campaign conducted in Amman, Jordan. The method prepares a set of different combinations of all measured meteorological parameters to be descriptors of PN concentration. In this case, we resort to artificial neural networks in the forms of a feed-forward neural network (FFNN) and a time-delay neural network (TDNN) as modeling tools, and then, we attempt to find the best descriptors using all these combinations as model inputs. The best modeling tools are FFNN for daily averaged data (with R 2 = 0.77 ) and TDNN for hourly averaged data (with R 2 = 0.66 ) where the best combinations of meteorological parameters are found to be temperature, relative humidity, pressure, and wind speed. As the models follow the patterns of diurnal cycles well, the results are considered to be satisfactory. When PN measurements are not directly available or there are massive missing PN concentration data, PN models can be used to estimate PN concentration using available measured meteorological parameters.
Collapse
Affiliation(s)
- Martha A. Zaidan
- Institute for Atmospheric and Earth System Research (INAR)/Physics, University of Helsinki, FI-00560 Helsinki, Finland;
| | - Ola Surakhi
- Department of Computer Science, The University of Jordan, Amman 11942, Jordan;
| | - Pak Lun Fung
- Institute for Atmospheric and Earth System Research (INAR)/Physics, University of Helsinki, FI-00560 Helsinki, Finland;
| | - Tareq Hussein
- Institute for Atmospheric and Earth System Research (INAR)/Physics, University of Helsinki, FI-00560 Helsinki, Finland;
- Department of Physics, The University of Jordan, Amman 11942, Jordan
| |
Collapse
|