1
|
Yasin KH, Yasin MI, Iguala AD, Gelete TB, Kebede E. Methodological Integration of Machine Learning and Geospatial Analysis for PM 10 Pollution Mapping. MethodsX 2025; 14:103322. [PMID: 40331028 PMCID: PMC12051153 DOI: 10.1016/j.mex.2025.103322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2025] [Accepted: 04/16/2025] [Indexed: 05/08/2025] Open
Abstract
Air pollution mitigation necessitates accurate spatial modelling to inform public health interventions. Traditional approaches inadequately capture complex predictor-pollutant interactions, whereas machine learning (ML) offers a superior capacity for modelling nonlinear relationships. This study compares three ML Random Forest (RF), K-Nearest Neighbors (KNN), and Naïve Bayes (NB) algorithms using annual PM10 data from 11 monitoring stations alongside atmospheric, urban, and terrain covariates. The methodological framework employed rigorous preprocessing and cross-validation to classify pollution into three categorical levels. Results demonstrate RF superior performance, achieving 94% balanced accuracy and 97% specificity, significantly outperforming KNN (92%) and NB (89%). RF excelled in capturing spatial heterogeneity and complex variable interactions, while KNN and NB exhibited limitations in managing feature dependencies and localized variability. Despite computational demands, findings substantiate RF reliability for robust air quality monitoring applications. The study contributes valuable insights for implementing scalable pollution prediction systems in resource-constrained urban environments while acknowledging interpretability challenges inherent to complex ML models.•Preprocessing of spatial data from various sources, incorporating the handling of missing/abnormal data, analysis, and normalization•Implementation of the three ML algorithms with rigorous hyperparameter tuning, model validation, and performance assessment•Mapping PM10 Hotspots on the Gradient Direction and Distance from the City Center.
Collapse
Affiliation(s)
- Kalid Hassen Yasin
- Geo-Information Science Program, School of Geography and Environmental Studies, Haramaya University, P.O. Box 138, 3220 Dire Dawa, Ethiopia
| | - Muaz Ismael Yasin
- School of Medicine, College of Health and Medical Sciences, Haramaya University, P.O. Box 235, Harar, Ethiopia
| | | | - Tadele Bedo Gelete
- Geo-Information Science Program, School of Geography and Environmental Studies, Haramaya University, P.O. Box 138, 3220 Dire Dawa, Ethiopia
| | - Erana Kebede
- School of Plant Sciences, College of Agriculture and Environmental Sciences, Haramaya University, P.O. Box 138, Dire Dawa, Ethiopia
| |
Collapse
|
2
|
Garbagna L, Babu Saheer L, Maktab Dar Oghaz M. AI-driven approaches for air pollution modelling: A comprehensive systematic review. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2025; 373:125937. [PMID: 40058557 DOI: 10.1016/j.envpol.2025.125937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Revised: 02/04/2025] [Accepted: 02/25/2025] [Indexed: 03/28/2025]
Abstract
In recent years, air quality levels have become a global issue with the rise of harmful pollutants and their effects on climate change. Urban areas are especially affected by air pollution, resulting in a deterioration of the environment and a surge in health complications. Research has been conducted on different studies that accurately predict future pollution concentration levels utilising different methods. This paper introduces the current physical models for air quality prediction and conducts an extensive systematic literature review on Machine Learning and Deep Learning techniques for predicting pollutants. This work compares different methodologies and techniques by grouping studies that utilise similar approaches together and comparing them. Furthermore, a distinction is made between temporal and spatiotemporal models to understand and highlight how both approaches impact future air pollutant concentration level predictions. The review differs from similar works as it focuses not only on comparing models and approaches but by analysing how the usage of external features, such as meteorological data, traffic information, and land usage, affect pollutant levels and the model's accuracy on air quality forecasting. Performances and limitations are explored for both Machine and Deep Learning approaches, and the work offers a discussion on their comparison and possible future developments in this research space. This review highlights how Deep Learning models tend to be more suitable for forecasting problems due to their feature and spatio-temporal correlation representation abilities, as well as providing different directions for further work, from models utilisation to feature inclusion.
Collapse
Affiliation(s)
- Lorenzo Garbagna
- Anglia Ruskin University, East Road, Cambridge, CB1 1PT, Cambridgeshire, United Kingdom.
| | - Lakshmi Babu Saheer
- Anglia Ruskin University, East Road, Cambridge, CB1 1PT, Cambridgeshire, United Kingdom
| | | |
Collapse
|
3
|
Clark LP, Zilber D, Schmitt C, Fargo DC, Reif DM, Motsinger-Reif AA, Messier KP. A review of geospatial exposure models and approaches for health data integration. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2025; 35:131-148. [PMID: 39251872 PMCID: PMC12009742 DOI: 10.1038/s41370-024-00712-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 08/01/2024] [Accepted: 08/05/2024] [Indexed: 09/11/2024]
Abstract
BACKGROUND Geospatial methods are common in environmental exposure assessments and increasingly integrated with health data to generate comprehensive models of environmental impacts on public health. OBJECTIVE Our objective is to review geospatial exposure models and approaches for health data integration in environmental health applications. METHODS We conduct a literature review and synthesis. RESULTS First, we discuss key concepts and terminology for geospatial exposure data and models. Second, we provide an overview of workflows in geospatial exposure model development and health data integration. Third, we review modeling approaches, including proximity-based, statistical, and mechanistic approaches, across diverse exposure types, such as air quality, water quality, climate, and socioeconomic factors. For each model type, we provide descriptions, general equations, and example applications for environmental exposure assessment. Fourth, we discuss the approaches used to integrate geospatial exposure data and health data, such as methods to link data sources with disparate spatial and temporal scales. Fifth, we describe the landscape of open-source tools supporting these workflows.
Collapse
Affiliation(s)
- Lara P Clark
- National Institute of Environmental Health Sciences, Office of the Scientific Director, Office of Data Science, Durham, NC, USA
| | - Daniel Zilber
- National Institute of Environmental Health Sciences, Division of Translational Toxicology, Predictive Toxicology Branch, Durham, NC, USA
| | - Charles Schmitt
- National Institute of Environmental Health Sciences, Office of the Scientific Director, Office of Data Science, Durham, NC, USA
| | - David C Fargo
- National Institute of Environmental Health Sciences, Office of the Director, Office of Environmental Science Cyberinfrastructure, Durham, NC, USA
| | - David M Reif
- National Institute of Environmental Health Sciences, Division of Translational Toxicology, Predictive Toxicology Branch, Durham, NC, USA
| | - Alison A Motsinger-Reif
- National Institute of Environmental Health Sciences, Division of Intramural Research, Biostatistics and Computational Biology Branch, Durham, NC, USA
| | - Kyle P Messier
- National Institute of Environmental Health Sciences, Division of Translational Toxicology, Predictive Toxicology Branch, Durham, NC, USA.
- National Institute of Environmental Health Sciences, Division of Intramural Research, Biostatistics and Computational Biology Branch, Durham, NC, USA.
| |
Collapse
|
4
|
Pak A, Rad AK, Nematollahi MJ, Mahmoudi M. Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models. Sci Rep 2025; 15:547. [PMID: 39747344 PMCID: PMC11696743 DOI: 10.1038/s41598-024-84342-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Accepted: 12/23/2024] [Indexed: 01/04/2025] Open
Abstract
As a significant global concern, air pollution triggers enormous challenges in public health and ecological sustainability, necessitating the development of precise algorithms to forecast and mitigate its impacts, which has led to the development of many machine learning (ML)-based models for predicting air quality. Meanwhile, overfitting is a prevalent issue with ML algorithms that decreases their efficacy and generalizability. The present investigation, using an extensive collection of data from 16 sensors in Tehran, Iran, from 2013 to 2023, focuses on applying the Least Absolute Shrinkage and Selection Operator (Lasso) regularisation technique to enhance the forecasting precision of ambient air pollutants concentration models, including particulate matter (PM2.5 and PM10), CO, NO2, SO2, and O3 while decreasing overfitting. The outputs were compared using the R-squared (R2), mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and normalised mean square error (NMSE) indices. Despite the preliminary findings revealing that Lasso dramatically enhances model reliability by decreasing overfitting and determining key attributes, the model's performance in predicting gaseous pollutants against PM remained unsatisfactory (R2PM2.5 = 0.80, R2PM10 = 0.75, R2CO = 0.45, R2NO2 = 0.55, R2SO2 = 0.65, and R2O3 = 0.35). The minimal degree of missing data presumably explained the strong performance of the PM model, while the high dynamism of gases and their chemical interactions, in conjunction with the inherent characteristics of the model, were the primary factors contributing to the poor performance of the model. Simultaneously, the successful implementation of the Lasso regularisation approach in mitigating overfitting and selecting more important features makes it highly suggested for application in air quality forecasting models.
Collapse
Affiliation(s)
- Abbas Pak
- Department of Computer Sciences, Shahrekord University, Shahrekord, Iran
| | - Abdullah Kaviani Rad
- Department of Environmental Engineering and Natural Resources, College of Agriculture, Shiraz University, Shiraz, 71946-85111, Iran
| | | | - Mohammadreza Mahmoudi
- Department of Statistics, Faculty of Science, Fasa University, Fasa, 74616-86131, Iran.
| |
Collapse
|
5
|
Valipour Shokouhi B, de Hoogh K, Gehrig R, Eeftens M. Spatiotemporal modelling of airborne birch and grass pollen concentration across Switzerland: A comparison of statistical, machine learning and ensemble methods. ENVIRONMENTAL RESEARCH 2024; 263:119999. [PMID: 39305973 DOI: 10.1016/j.envres.2024.119999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 08/31/2024] [Accepted: 09/12/2024] [Indexed: 09/28/2024]
Abstract
BACKGROUND Statistical and machine learning models are commonly used to estimate spatial and temporal variability in exposure to environmental stressors, supporting epidemiological studies. We aimed to compare the performances, strengths and limitations of six different algorithms in the retrospective spatiotemporal modeling of daily birch and grass pollen concentrations at a spatial resolution of 1 km across Switzerland. METHODS Daily birch and grass pollen concentrations were available from 14 measurement sites in Switzerland for 2000-2019. To develop the spatiotemporal models, we considered spatiotemporal, spatial and temporal predictors including meteorological factors, land-use, elevation, species distribution and Normalized Difference Vegetation Index (NDVI). We used six statistical and machine learning algorithms: LASSO, Ridge, Elastic net, Random forest, XGBoost and ANNs. We optimized model structures through feature selection and grid search techniques to obtain the best predictive performance. We used train-test split and cross-validation to avoid overfitting and overoptimistic performance indicators. We then combined these six models through multiple linear regression to develop an ensemble hybrid model. RESULTS The 5th-95th percentiles of birch and grass pollen concentrations were 0-151 and 0-105 grains/m3, respectively. The hybrid ensemble model achieved the best RMSE on the test dataset for both birch and grass pollen with 94.4 and 19.7 grains/m3, respectively. Nonlinear models (Random forest, XGBoost and ANNs) achieved lower test RMSE's than linear models (LASSO, Ridge, Elastic net) for both pollen types, with RMSE's ranging from 105.9 to 140.5 grains/m3 for birch and from 20.0 to 25.4 grains/m3 for grass pollen. The Random forest algorithm yielded the best spatial and temporal performance among the six evaluated modelling methods. The ensemble hybrid model outperformed the six linear and nonlinear algorithms. Country-wide pollen concentration, land use, weather, and NDVI were important predictors. CONCLUSION Nonlinear algorithms outperformed linear models and accurately explained complex, nonlinear relationships between environmental factors and measured concentrations.
Collapse
Affiliation(s)
- Behzad Valipour Shokouhi
- Swiss Tropical and Public Health Institute, Allschwil, Switzerland; University of Basel, Basel, Switzerland
| | - Kees de Hoogh
- Swiss Tropical and Public Health Institute, Allschwil, Switzerland; University of Basel, Basel, Switzerland
| | - Regula Gehrig
- Federal Office of Meteorology and Climatology MeteoSwiss, Switzerland
| | - Marloes Eeftens
- Swiss Tropical and Public Health Institute, Allschwil, Switzerland; University of Basel, Basel, Switzerland.
| |
Collapse
|
6
|
Vachon J, Buteau S, Liu Y, Van Ryswyk K, Hatzopoulou M, Smargiassi A. Spatial and spatiotemporal modelling of intra-urban ultrafine particles: A comparison of linear, nonlinear, regularized, and machine learning methods. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 954:176523. [PMID: 39326743 DOI: 10.1016/j.scitotenv.2024.176523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 09/09/2024] [Accepted: 09/23/2024] [Indexed: 09/28/2024]
Abstract
BACKGROUND Machine learning methods are proposed to improve the predictions of ambient air pollution, yet few studies have compared ultrafine particles (UFP) models across a broad range of statistical and machine learning approaches, and only one compared spatiotemporal models. Most reported marginal differences between methods. This limits our ability to draw conclusions about the best methods to model ambient UFPs. OBJECTIVE To compare the performance and predictions of statistical and machine learning methods used to model spatial and spatiotemporal ambient UFPs. METHODS Daily and annual models were developed from UFP measurements from a year-long mobile monitoring campaign in Quebec City, Canada, combined with 262 geospatial and six meteorological predictors. Various road segment lengths were considered (100/300/500 m) for UFP data aggregation. Four statistical methods included linear, non-linear, and regularized regressions, whereas eight machine learning regressions utilized tree-based, neural networks, support vector, and kernel ridge algorithms. Nested cross-validation was used for model training, hyperparameter tuning and performance evaluation. RESULTS Mean annual UFP concentrations was 13,335 particles/cm3. Machine learning outperformed statistical methods in predicting UFPs. Tree-based methods performed best across temporal scales and segment lengths, with XGBoost producing the overall best performing models (annual R2 = 0.78-0.86, RMSE = 2163-2169 particles/cm3; daily R2 = 0.47-0.48, RMSE = 8651-11,422 particles/cm3). With 100 m segments, other annual models performed similarly well, but their prediction surfaces of annual mean UFP concentrations showed signs of overfitting. Spatial aggregation of monitoring data significantly impacted model performance. Longer segments yielded lower RMSE in all daily models and for annual statistical models, but not for annual machine learning models. CONCLUSIONS The use of tree-based methods significantly improved spatiotemporal predictions of UFP concentrations, and to a lesser extent annual concentrations. Segment length and hyperparameter tuning had notable impacts on model performance and should be considered in future studies.
Collapse
Affiliation(s)
- Julien Vachon
- Department of Environmental and Occupational Health, School of Public Health, University of Montreal, Montreal, Canada; Center for Public Health Research (CReSP), University of Montreal and CIUSSS du Centre-Sud-de-l'Île-de-Montréal, Montreal, Canada
| | - Stéphane Buteau
- Department of Environmental and Occupational Health, School of Public Health, University of Montreal, Montreal, Canada; Center for Public Health Research (CReSP), University of Montreal and CIUSSS du Centre-Sud-de-l'Île-de-Montréal, Montreal, Canada
| | - Ying Liu
- Department of Environmental and Occupational Health, School of Public Health, University of Montreal, Montreal, Canada
| | - Keith Van Ryswyk
- Air Pollution Exposure Science Section, Water and Air Quality Bureau, Health Canada, Ottawa, Canada
| | | | - Audrey Smargiassi
- Department of Environmental and Occupational Health, School of Public Health, University of Montreal, Montreal, Canada; Center for Public Health Research (CReSP), University of Montreal and CIUSSS du Centre-Sud-de-l'Île-de-Montréal, Montreal, Canada.
| |
Collapse
|
7
|
Chen F, Liu X, Lu C, Ruan M, Wen Y, Wang S, Song Y, Li L, Zhou L, Jiang H, Wu L. High-throughput prediction of stalk cellulose and hemicellulose content in maize using machine learning and Fourier transform infrared spectroscopy. BIORESOURCE TECHNOLOGY 2024; 413:131531. [PMID: 39321938 DOI: 10.1016/j.biortech.2024.131531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/23/2024] [Accepted: 09/22/2024] [Indexed: 09/27/2024]
Abstract
Cellulose and hemicellulose are key cross-linked carbohydrates affecting bioethanol production in maize stalks. Traditional wet chemical methods for their detection are labor-intensive, highlighting the need for high-throughput techniques. This study used Fourier transform infrared (FTIR) spectroscopy combined with machine learning (ML) algorithms on 200 large-scale maize germplasms to develop robust predictive models for stalk cellulose, hemicellulose and holocellulose content. We identified several peak height features correlated with three contents, used them as input data for model building. Four ML algorithms demonstrated higher predictive accuracy, achieving coefficient of determination (R2) ranging from 0.83 to 0.97. Notably, the Categorical Boosting algorithm yielded optimal models with coefficient of determination (R2) exceeding 0.91 for the training set and over 0.81 for the test set. The approach combined FTIR spectroscopy with ML algorithms offers a precise and high-throughput tool for predicting stalk cellulose, hemicellulose and holocellulose contents, benefiting maize genetic breeding for bioenergy and biofuels.
Collapse
Affiliation(s)
- Fanghui Chen
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Xing Liu
- School of Materials and Chemistry, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Chengchen Lu
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Mingxiu Ruan
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Yujing Wen
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Shaodong Wang
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Youhong Song
- School of Agronomy, Anhui Agricultural University, Hefei, 230036, China
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| | - Liang Zhou
- School of Materials and Chemistry, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Haiyang Jiang
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China.
| | - Leiming Wu
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China.
| |
Collapse
|
8
|
Gebler D, Segurado P, Ferreira MT, Aguiar FC. Predicting freshwater biological quality using macrophytes: A comparison of empirical modelling approaches. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024; 31:65092-65108. [PMID: 39567452 PMCID: PMC11624229 DOI: 10.1007/s11356-024-35497-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 10/29/2024] [Indexed: 11/22/2024]
Abstract
Difficulties have hampered bioassessment in southern European rivers due to limited reference data and the unclear impact of multiple interacting stressors on plant communities. Predictive modelling may help overcome this limitation by aggregating different pressures affecting aquatic organisms and showing the most influential factors. We assembled a dataset of 292 Mediterranean sampling locations on perennial rivers and streams (mainland Portugal) with macrophyte and environmental data. We compared models based on multiple linear regression (MLR), boosted regression trees (BRT) and artificial neural networks (ANNs). Secondarily, we investigated the relationship between two macrophyte indices grounded in distinct conceptual premises (the Riparian Vegetation Index - RVI, and the Macrophyte Biological Index for Rivers - IBMR) and a set of environmental variables, including climatic conditions, geographical characteristics, land use, water chemistry and habitat quality of rivers. The quality of models for the IBMR was superior to those for the RVI in all cases, which indicates a better ecological linkage of IBMR with the stressor and abiotic variables. The IBMR using ANN outperformed the BRT models, for which the r-Pearson correlation coefficients were 0.877 and 0.801, and the normalised root mean square errors were 10.0 and 11.3, respectively. Variable importance analysis revealed that longitude and geology, hydrological/climatic conditions, water body size and land use had the highest impact on the IBMR model predictions. Despite the differences in the quality of the models, all showed similar importance to individual input variables, although in a different order. Despite some difficulties in model training for ANNs, our findings suggest that BRT and ANNs can be used to assess ecological quality, and for decision-making on the environmental management of rivers.
Collapse
Affiliation(s)
- Daniel Gebler
- Department of Ecology and Environmental Protection, Poznan University of Life Sciences, Wojska Polskiego 28, 60-637, Poznan, Poland.
| | - Pedro Segurado
- Forest Research Centre, Associate Laboratory TERRA, School of Agriculture, University of Lisbon, Lisbon, Portugal
| | - Maria Teresa Ferreira
- Forest Research Centre, Associate Laboratory TERRA, School of Agriculture, University of Lisbon, Lisbon, Portugal
| | - Francisca C Aguiar
- Forest Research Centre, Associate Laboratory TERRA, School of Agriculture, University of Lisbon, Lisbon, Portugal
| |
Collapse
|
9
|
Li Y, Huang T, Lee HF, Heo Y, Ho KF, Yim SHL. Integrating Doppler LiDAR and machine learning into land-use regression model for assessing contribution of vertical atmospheric processes to urban PM 2.5 pollution. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 952:175632. [PMID: 39168320 DOI: 10.1016/j.scitotenv.2024.175632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 08/06/2024] [Accepted: 08/17/2024] [Indexed: 08/23/2024]
Abstract
Air pollution has been recognized as a global issue, through adverse effects on environment and health. While vertical atmospheric processes substantially affect urban air pollution, traditional epidemiological research using Land-use regression (LUR) modeling usually focused on ground-level attributes without considering upper-level atmospheric conditions. This study aimed to integrate Doppler LiDAR and machine learning techniques into LUR models (LURF-LiDAR) to comprehensively evaluate urban air pollution in Hong Kong, and to assess complex interactions between vertical atmospheric processes and urban air pollution from long-term (i.e., annual) and short-term (i.e., two air pollution episodes) views in 2021. The results demonstrated significant improvements in model performance, achieving CV R2 values of 0.81 (95 % CI: 0.75-0.86) for the long-term PM2.5 prediction model and 0.90 (95 % CI: 0.87-0.91) for the short-term models. Approximately 69 % of ground-level air pollution arose from the mixing of ground- and lower-level (105 m-225 m) particles, while 21 % was associated with upper-level (825 m-945 m) atmospheric processes. The identified transboundary air pollution (TAP) layer was located at ~900 m above the ground. The identified Episode one (E1: 7 Jan-22 Jan) was induced by the accumulation of local emissions under stable atmospheric conditions, whereas Episode two (E2: 13 Dec-24 Dec) was regulated by TAP under instable and turbulent conditions. Our improved air quality prediction model is accurate and comprehensive with high interpretability for supporting urban planning and air quality policies.
Collapse
Affiliation(s)
- Yue Li
- Department of Geography and Resource Management, The Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong 999077, China
| | - Tao Huang
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 639798, Singapore; Earth Observatory of Singapore, Nanyang Technological University, Singapore 639798, Singapore
| | - Harry Fung Lee
- Department of Geography and Resource Management, The Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong 999077, China
| | - Yeonsook Heo
- School of Civil, Environmental and Architectural Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, Republic of Korea
| | - Kin-Fai Ho
- The Jockey Club School of Public Health and Primary Care, The Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong 999077, China
| | - Steve H L Yim
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 639798, Singapore; Earth Observatory of Singapore, Nanyang Technological University, Singapore 639798, Singapore; Asian School of the Environment, Nanyang Technological University, Singapore 639798, Singapore.
| |
Collapse
|
10
|
Wen Y, Liu X, He F, Shi Y, Chen F, Li W, Song Y, Li L, Jiang H, Zhou L, Wu L. Machine learning prediction of stalk lignin content using Fourier transform infrared spectroscopy in large scale maize germplasm. Int J Biol Macromol 2024; 280:136140. [PMID: 39349086 DOI: 10.1016/j.ijbiomac.2024.136140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 09/24/2024] [Accepted: 09/27/2024] [Indexed: 10/02/2024]
Abstract
Lignin has been recognized as a major factor contributing to lignocellulosic recalcitrance in biofuel production and attracted attentions as a high-value product in the biorefinery field. As the traditional wet chemical methods for detecting lignin content are labor-intensive, time-consuming and environment-toxic, it is an urgent need to develop high-throughput and environment-friendly techniques for large-scale crop germplasms screening. In this study, we conducted a Fourier transform infrared (FTIR) assay on 150 maize germplasms with a diverse lignin composition to build predictive models for lignin content in maize stalk. Principal component analysis (PCA) was applied to the FTIR spectra for use as model inputs. Classification and advanced gradient boosting machine (GBM) algorithms demonstrated higher predictive accuracy (0.82-0.96) compared to traditional linear and regularization algorithms (0.03-0.04) in the training set. Notably, two optimal models, built using the extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) algorithms, achieved R2 values of over 0.91 in the training set and over 0.82 in the test set. Overall, the combination of FTIR and machine learning (ML) algorithms offers a high-throughput and efficient method for predicting lignin content. This approach holds significant potential for genetic breeding and the effective utilization of maize in industrial production.
Collapse
Affiliation(s)
- Yujing Wen
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Xing Liu
- School of Materials and Chemistry, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Feng He
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Yanli Shi
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Fanghui Chen
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Wenfei Li
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Youhong Song
- School of Agronomy, Anhui Agricultural University, Hefei 230036, China
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| | - Haiyang Jiang
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Liang Zhou
- School of Materials and Chemistry, Anhui Agricultural University, Hefei, Anhui 230036, China.
| | - Leiming Wu
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China.
| |
Collapse
|
11
|
Su JG, Shahriary E, Sage E, Jacobsen J, Park K, Mohegh A. Development of over 30-years of high spatiotemporal resolution air pollution models and surfaces for California. ENVIRONMENT INTERNATIONAL 2024; 193:109100. [PMID: 39520932 DOI: 10.1016/j.envint.2024.109100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 10/22/2024] [Accepted: 10/24/2024] [Indexed: 11/16/2024]
Abstract
California's diverse geography and meteorological conditions necessitate models capturing fine-grained patterns of air pollution distribution. This study presents the development of high-resolution (100 m) daily land use regression (LUR) models spanning 1989-2021 for nitrogen dioxide (NO2), fine particulate matter (PM2.5), and ozone (O3) across California. These machine learning LUR algorithms integrated comprehensive data sources, including traffic, land use, land cover, meteorological conditions, vegetation dynamics, and satellite data. The modeling process incorporated historical air quality observations utilizing continuous regulatory, fixed site saturation, and Google Streetcar mobile monitoring data. The model performance (adjusted R2) for NO2, PM2.5, and O3 was 84 %, 65 %, and 92 %, respectively. Over the years, NO2 concentrations showed a consistent decline, attributed to regulatory efforts and reduced human activities on weekends. Traffic density and weather conditions significantly influenced NO2 levels. PM2.5 concentrations also decreased over time, influenced by aerosol optical depth (AOD), traffic density, weather, and land use patterns, such as developed open spaces and vegetation. Industrial activities and residential areas contributed to higher PM2.5 concentrations. O3 concentrations exhibited no significant annual trend, with higher levels observed on weekends and lower levels associated with traffic density due to the scavenger effect. Weather conditions and land use, such as commercial areas and water bodies, influenced O3 concentrations. To extend the prediction of daily NO2, PM2.5, and O3 to 1989, models were developed for predictors such as daily road traffic, normalized difference vegetation index (NDVI), Ozone Monitoring Instrument (OMI)-NO2, monthly AOD, and OMI-O3. These models enabled effective estimation for any period with known daily weather conditions. Longitudinal analysis revealed a consistent NO2 decline, regulatory-driven PM2.5 decreases countered by wildfire impacts, and spatially variable O3 concentrations with no long-term trend. This study enhances understanding of air pollution trends, aiding in identifying lifetime exposure for statewide populations and supporting informed policy decisions and environmental justice advocacy.
Collapse
Affiliation(s)
- Jason G Su
- School of Public Health, University of California, Berkeley Berkeley, CA 94720 the United States of America.
| | - Eahsan Shahriary
- School of Public Health, University of California, Berkeley Berkeley, CA 94720 the United States of America
| | - Emma Sage
- School of Public Health, University of California, Berkeley Berkeley, CA 94720 the United States of America
| | - John Jacobsen
- School of Public Health, University of California, Berkeley Berkeley, CA 94720 the United States of America
| | - Katherine Park
- School of Public Health, University of California, Berkeley Berkeley, CA 94720 the United States of America
| | - Arash Mohegh
- Research Division, California Air Resources Board, Sacramento, CA 95812, the United States of America
| |
Collapse
|
12
|
Abdillah SFI, You SJ, Wang YF. Characterizing sector-oriented roadside exposure to ultrafine particles (PM 0.1) via machine learning models: Implications of covariates influences on sectors variability. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 359:124595. [PMID: 39053804 DOI: 10.1016/j.envpol.2024.124595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 07/17/2024] [Accepted: 07/21/2024] [Indexed: 07/27/2024]
Abstract
Ultrafine particles (UFPs; PM0.1) possess intensified health risk due to their smaller size and unique spatial variability. One of major emission sources for UFPs is vehicle exhaust, which varies based on the traffic composition in each type of roadside sector. The current challenge of epidemiological UFPs study is limited characterization ability due to expensive instruments. This study assessed the UFPs particle number concentrations (UFPs PNC) exposure dose for typical healthy adults and children at three different roadside sectors, including industrial roadside (IN), residential roadside (RS), and urban background (UB). Furthermore, this study also developed and utilized machine learning (ML) algorithms that could accurately characterize the UFPs exposure dose and explain the covariates effects on the model outputs, representing the intra-urban variability of UFPs between sectors. It was found that the average inhaled UFPs dose for healthy adults and children during off-peak season (warm period) were 1.71 ± 0.19 × 1010; 1.28 ± 0.22 × 1010; 1.09 ± 0.18 × 1010 #/hour and 1.33 ± 0.15 × 1010; 0.99 ± 0.17 × 1010; 0.86 ± 0.14 × 1010 #/hour at IN, RS, UB. Inhaled UFPs were mainly deposited in tracheobronchial (TB) respiratory fraction for adults (67.7%) and in alveoli (ALV) fraction for children (67.5%). Among three ML algorithms implemented in this study, XGBoost possessed the highest UFPs PNC exposure dose estimation performances with R2 = 0.965; 0.959; 0.929 & RMSE = 0.79 × 108; 0.54 × 108; 0.15 × 105 #/hour at IN, RS, and UB which then followed by multiple linear regression (MLR), and random forest (RF). Furthermore, SHAP analysis from the XGBoost model has successfully pointed out the spatial variability of each roadside sector by quantifying the approximated contributions of covariates to the model's output. Findings in this study highlighted the potential use of ML models as an alternative for preliminary particle exposure source apportionment.
Collapse
Affiliation(s)
- Sultan F I Abdillah
- Department of Civil Engineering, Chung Yuan Christian University, Zhongli, Taoyuan, 32023, Taiwan; Department of Environmental Engineering, Chung Yuan Christian University, Zhongli, Taoyuan, 32023, Taiwan; Center for Environmental Risk Management, Chung Yuan Christian University, Zhongli, Taoyuan, 32023, Taiwan
| | - Sheng-Jie You
- Department of Environmental Engineering, Chung Yuan Christian University, Zhongli, Taoyuan, 32023, Taiwan; Center for Environmental Risk Management, Chung Yuan Christian University, Zhongli, Taoyuan, 32023, Taiwan
| | - Ya-Fen Wang
- Department of Environmental Engineering, Chung Yuan Christian University, Zhongli, Taoyuan, 32023, Taiwan; Sustainable Environmental Education Center, Chung Yuan Christian University, Zhongli, Taoyuan, 32023, Taiwan.
| |
Collapse
|
13
|
Wei P, Hao S, Shi Y, Anand A, Wang Y, Chu M, Ning Z. Combining Google traffic map with deep learning model to predict street-level traffic-related air pollutants in a complex urban environment. ENVIRONMENT INTERNATIONAL 2024; 191:108992. [PMID: 39250881 DOI: 10.1016/j.envint.2024.108992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 08/26/2024] [Accepted: 08/29/2024] [Indexed: 09/11/2024]
Abstract
BACKGROUND Traffic-related air pollution (TRAP) is a major contributor to urban pollution and varies sharply at the street level, posing a challenge for air quality modeling. Traditional land use regression models combined with data from fixed monitoring stations may be unable to predict and characterize fine-scale TRAP, especially in complex urban environments influenced by various features. This study aims to estimate fine-scale (50 m) concentrations of nitrogen oxides (NO and NO₂) in Hong Kong using a deep learning (DL) structured model. METHODS We collected data from mobile air quality sensors on buses and crowd-sourced Google real-time traffic status as a proxy for real-time traffic emissions. Our DL model was compared with existing machine learning models to assess performance improvements. Using an interpretable machine learning method, we hierarchically evaluated the global, local, and interaction effects for different features. RESULTS Our DL model outperformed existing machine learning models, achieving R2 values of 0.72 for NO and 0.69 for NO₂. The incorporation of traffic status as a key predictor improved model performance by 9% to 17%. The interpretable machine learning method revealed the importance of traffic-related features and their pairwise interactions. CONCLUSION The results indicate that traffic-related features significantly contribute to TRAP and provide insights and guidance for urban planning. By incorporating crowd-sourced Google traffic information, we assessed traffic abatement scenarios that could inform targeted strategies for improving urban air quality.
Collapse
Affiliation(s)
- Peng Wei
- College of Geography and Environment, Shandong Normal University, Jinan, China; Division of Environment and Sustainability, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Song Hao
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China.
| | - Yuan Shi
- Department of Geography & Planning, University of Liverpool, Liverpool, UK.
| | - Abhishek Anand
- Department of Mechanical Engineering, Carnegie Mellon University, United States
| | - Ya Wang
- Division of Environment and Sustainability, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Mengyuan Chu
- Division of Environment and Sustainability, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Zhi Ning
- Division of Environment and Sustainability, The Hong Kong University of Science and Technology, Hong Kong, China.
| |
Collapse
|
14
|
Zhao R, Wang G, Li F, Wang J, Zhang Y, Li D, Liu S, Li J, Song J, Wei F, Wang C. Developing Machine Learning-Based Predictive Models for Hallux Valgus Recurrence Based on Measurements From Radiographs. Foot Ankle Int 2024; 45:1000-1008. [PMID: 38872342 DOI: 10.1177/10711007241256648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
BACKGROUND Machine learning (ML) is increasingly used to predict the prognosis of numerous diseases. This retrospective analysis aimed to develop a prediction model using ML algorithms and to identify predictors associated with the recurrence of hallux valgus (HV) following surgery. METHODS A total of 198 symptomatic feet that underwent chevron osteotomy combined with a distal soft tissue procedure were enrolled and analyzed from 2 independent medical centers. The feet were grouped according to nonrecurrence or recurrence based on 1-year follow-up outcomes. Preoperative weightbearing radiographs and immediate postoperative nonweightbearing radiographs were obtained for each HV foot. Radiographic measurements (eg, HV angle and intermetatarsal angle) were acquired and used for ML model training. A total of 9 commonly used ML models were trained on the data obtained from one institute (108 feet), and tested on the other data set from another independent institute (90 feet) for external validation. Optimal feature sets for each model were identified based on a 2000-resample bootstrap-based internal validation via an exhaustive search. The performance of each model was then tested on the external validation set. The area under the curve (AUC), classification accuracy, sensitivity, and specificity of each model were calculated to evaluate the performance of each model. RESULTS The support vector machine (SVM) model showed the highest predictive accuracy compared to other methods, with an AUC of 0.88 and an accuracy of 75.6%. Preoperative hallux valgus angle, tibial sesamoid position, postoperative intermetatarsal angle, and postoperative tibial sesamoid position were identified as the most selected features by several ML models. CONCLUSION ML classifiers such as SVM could predict the recurrence of HV (an HVA >20 degrees) at a 1-year follow-up while identifying associated predictors in a multivariate manner. This study holds the potential for foot and ankle surgeons to effectively identify individuals at higher risk of HV recurrence postsurgery.
Collapse
Affiliation(s)
- Rui Zhao
- Department of Orthopedic Surgery, Tianjin Medical University General Hospital, Tianjin, China
| | - Guobin Wang
- Department of Orthopedic Surgery, Tianjin Medical University General Hospital, Tianjin, China
| | - Fengtan Li
- Department of Radiology, Tianjin Medical University General Hospital, Tianjin, China
| | - Jinchan Wang
- Department of Dermatology, Tianjin Medical University General Hospital, Tianjin, China
| | - Yuan Zhang
- Department of Orthopedic Surgery, Tianjin Medical University General Hospital, Tianjin, China
| | - Dong Li
- Department of Orthopedic Surgery, Tianjin Medical University General Hospital, Tianjin, China
| | - Shen Liu
- Department of Orthopedic Surgery, Tianjin Medical University General Hospital, Tianjin, China
| | - Jie Li
- Graduate School, Tianjin Medical University, Tianjin, China
| | - Jiajun Song
- Department of Orthopedic Surgery, Tianjin Medical University General Hospital, Tianjin, China
| | - Fangyuan Wei
- Department of Hand and Foot Surgery, Beijing University of Chinese Medicine Third Affiliated Hospital, Beijing, China
- Engineering Research Center of Chinese Orthopaedic and Sports Rehabilitation Artificial Intelligent, Ministry of Education, Beijing, China
| | - Chenguang Wang
- Department of Orthopedic Surgery, Tianjin Medical University General Hospital, Tianjin, China
| |
Collapse
|
15
|
Hu T, Li K, Ma C, Zhou N, Chen Q, Qi C. Improved classification of soil As contamination at continental scale: Resolving class imbalances using machine learning approach. CHEMOSPHERE 2024; 363:142697. [PMID: 38925515 DOI: 10.1016/j.chemosphere.2024.142697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 06/11/2024] [Accepted: 06/23/2024] [Indexed: 06/28/2024]
Abstract
The identification of arsenic (As)-contaminated areas is an important prerequisite for soil management and reclamation. Although previous studies have attempted to identify soil As contamination via machine learning (ML) methods combined with soil spectroscopy, they have ignored the rarity of As-contaminated soil samples, leading to an imbalanced learning problem. A novel ML framework was thus designed herein to solve the imbalance issue in identifying soil As contamination from soil visible and near-infrared spectra. Spectral preprocessing, imbalanced dataset resampling, and model comparisons were combined in the ML framework, and the optimal combination was selected based on the recall. In addition, Bayesian optimization was used to tune the model hyperparameters. The optimized model achieved recall, area under the curve, and balanced accuracy values of 0.83, 0.88, and 0.79, respectively, on the testing set. The recall was further improved to 0.87 with the threshold adjustment, indicating the model's excellent performance and generalization capability in classifying As-contaminated soil samples. The optimal model was applied to a global soil spectral dataset to predict areas at a high risk of soil As contamination on a global scale. The ML framework established in this study represents a milestone in the classification of soil As contamination and can serve as a valuable reference for contamination management in soil science.
Collapse
Affiliation(s)
- Tao Hu
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Kechao Li
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Chundi Ma
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Nana Zhou
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Qiusong Chen
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Chongchong Qi
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China; School of Metallurgy and Environment, Central South University, Changsha, 410083, China; Fankou Lead-Zinc Mine, NONFEMET, Shaoguan, 511100, China.
| |
Collapse
|
16
|
Huang Y, Wang Q, Ou X, Sheng D, Yao S, Wu C, Wang Q. Identification of response regulation governing ozone formation based on influential factors using a random forest approach. Heliyon 2024; 10:e36303. [PMID: 39224321 PMCID: PMC11367417 DOI: 10.1016/j.heliyon.2024.e36303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 08/04/2024] [Accepted: 08/13/2024] [Indexed: 09/04/2024] Open
Abstract
The pursuit of enhanced scientific, refined, and precise ozone and air quality control continues to pose significant challenges. Using data visualization techniques and random forest (RF) algorithms, the temporal distribution of atmospheric pollutants and the interrelationship between O3 concentration and its influential factors were investigated with one-year monitoring data in Deqing county in 2021. The local atmospheric conditions predominantly belonged to NOx-sensitive and transition zone. Extremely high O3 concentration were primarily observed when temperatures (T) exceeded 30 °C, with relative humidity (RH) ranging between 30 and 60 %. NO2, RH and T were identified as the top 3 important factors, and O3 concentration have stronger linearly relationship to RH and T, while stronger nonlinearly relationship to NO2. By employing an optimized RF model, controlling consistent mild and high reaction atmospheric conditions, the O3 concentration response to the change of individual influencing factors was acquired. The O3 concentration increased and then decreased in response to the increasing NO2 concentration, displaying a characteristic inflection point at 10 μg m-3. More reactive radicals produced at higher VOCs concentration and continuing NOx cycle at lower NO2 concentration, resulting in the acceleration in the direction of producing more O3. Therefore, the significant different O3 response to variation of VOCs and NOx concentration between mild and high reaction atmospheric conditions, as well as the existing of oxidant elevation should be considered in local air quality control. This study demonstrates the efficacy of ML methods in simulating nonlinear response of O3, supports the understanding of local O3 formation and quick guidance for precise local O3 pollution control and the related strategies.
Collapse
Affiliation(s)
- Yan Huang
- Ecological Environmental Monitoring Station of Deqing County, Huzhou, 313200, China
- College of Environment, Zhejiang University of Technology, Hangzhou, 310032, China
| | - Qingqing Wang
- Ecological Environmental Monitoring Station of Deqing County, Huzhou, 313200, China
| | - Xiaojie Ou
- College of Environment, Zhejiang University of Technology, Hangzhou, 310032, China
| | - Dongping Sheng
- College of Environment, Zhejiang University of Technology, Hangzhou, 310032, China
| | - Shengdong Yao
- College of Environment, Zhejiang University of Technology, Hangzhou, 310032, China
| | - Chengzhi Wu
- Trinity Consultants, Inc. (China Office), Hangzhou, 310012, China
| | - Qiaoli Wang
- College of Environment, Zhejiang University of Technology, Hangzhou, 310032, China
| |
Collapse
|
17
|
Asri AK, Newman GD, Tao Z, Zhu R, Chen HL, Lung SCC, Wu CD. What is the spatiotemporal pattern of benzene concentration spread over susceptible area surrounding the Hartman Park community, Houston, Texas? JOURNAL OF HAZARDOUS MATERIALS 2024; 474:134666. [PMID: 38815389 PMCID: PMC11975435 DOI: 10.1016/j.jhazmat.2024.134666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 05/07/2024] [Accepted: 05/19/2024] [Indexed: 06/01/2024]
Abstract
The Hartman Park community in Houston, Texas-USA, is in a highly polluted area which poses significant risks to its predominantly Hispanic and lower-income residents. Surrounded by dense clustering of industrial facilities compounds health and safety hazards, exacerbating environmental and social inequalities. Such conditions emphasize the urgent need for environmental measures that focus on investigating ambient air quality. This study estimated benzene, one of the most reported pollutants in Hartman Park, using machine learning-based approaches. Benzene data was collected in residential areas in the neighborhood and analyzed using a combination of five machine-learning algorithms (i.e., XGBR, GBR, LGBMR, CBR, RFR) through a newly developed ensemble learning model. Evaluations on model robustness, overfitting tests, 10-fold cross-validation, internal and stratified validation were performed. We found that the ensemble model depicted about 98.7% spatial variability of benzene (Adj. R2 =0.987). Through rigorous validations, stability of model performance was confirmed. Several predictors that contribute to benzene were identified, including temperature, developed intensity areas, leaking petroleum storage tank, and traffic-related factors. Analyzing spatial patterns, we found high benzene spread over areas near industrial zones as well as in residential areas. Overall, our study area was exposed to high benzene levels and requires extra attention from relevant authorities.
Collapse
Affiliation(s)
- Aji Kusumaning Asri
- Department of Geomatics, College of Engineering, National Cheng Kung University, Tainan 701, Taiwan, ROC.
| | - Galen D Newman
- Department of Landscape Architecture and Urban Planning, School of Architecture Texas A&M University, 3137 TAMU, College Station, TX 77843, USA
| | - Zhihan Tao
- Department of Landscape Architecture and Urban Planning, School of Architecture Texas A&M University, 3137 TAMU, College Station, TX 77843, USA
| | - Rui Zhu
- Department of Landscape Architecture and Urban Planning, School of Architecture Texas A&M University, 3137 TAMU, College Station, TX 77843, USA
| | - Hsiu-Ling Chen
- Department of Food Safety Hygiene and Risk Management, National Cheng Kung University, Tainan 701, Taiwan, ROC
| | - Shih-Chun Candice Lung
- Research Center for Environmental Changes, Academia Sinica, Taipei, Taiwan, ROC; Department of Atmospheric Sciences, National Taiwan University, Taipei, Taiwan, ROC; Institute of Environmental Health, School of Public Health, National Taiwan University, Taipei, Taiwan, ROC
| | - Chih-Da Wu
- Department of Geomatics, College of Engineering, National Cheng Kung University, Tainan 701, Taiwan, ROC; National Institute of Environmental Health Sciences, National Health Research Institutes, Miaoli, Taiwan, ROC; Innovation and Development Center of Sustainable Agriculture, National Chung Hsing University, Taichung City 402, Taiwan, ROC; Research Center for Precision Environmental Medicine, Kaohsiung Medical University, Kaohsiung 804, Taiwan, ROC.
| |
Collapse
|
18
|
Zalzal J, Minet L, Brook J, Mihele C, Chen H, Hatzopoulou M. Capturing Exposure Disparities with Chemical Transport Models: Evaluating the Suitability of Downscaling Using Land Use Regression. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024. [PMID: 39092553 DOI: 10.1021/acs.est.4c03725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
High resolution exposure surfaces are essential to capture disparities in exposure to traffic-related air pollution in urban areas. In this study, we develop an approach to downscale Chemical Transport Model (CTM) simulations to a hyperlocal level (∼100m) in the Greater Toronto Area (GTA) under three scenarios where emissions from cars, trucks and buses are zeroed out, thus capturing the burden of each transportation mode. This proposed approach statistically fuses CTMs with Land-Use Regression using machine learning techniques. With this proposed downscaling approach, changes in air pollutant concentrations under different scenarios are appropriately captured by downscaling factors that are trained to reflect the spatial distribution of emission reductions. Our validation analysis shows that high-resolution models resulted in better performance than coarse models when compared with observations at reference stations. We used this downscaling approach to assess disparities in exposure to nitrogen dioxide (NO2) for populations composed of renters, low-income households, recent immigrants, and visible minorities. Individuals in all four categories were disproportionately exposed to the burden of cars, trucks, and buses. We conducted this analysis at spatial resolutions of 12, 4, 1 km, and 100 m and observed that disparities were significantly underestimated when using coarse spatial resolutions. This reinforces the need for high-spatial resolution exposure surfaces for environmental justice analyses.
Collapse
Affiliation(s)
- Jad Zalzal
- Department of Civil & Mineral Engineering, University of Toronto, 35 St George Street, Toronto, Ontario M5S 1A4, Canada
| | - Laura Minet
- Department of Civil Engineering, University of Victoria, 3800 Finnerty Road, Victoria, British Columbia V8P 5C2, Canada
| | - Jeffrey Brook
- Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto, Ontario M5T 3M7, Canada
| | - Cristian Mihele
- Air Quality Research Division, Environment and Climate Change Canada, 4905 Dufferin Street, North York, Ontario M3H 5T4, Canada
| | - Hong Chen
- Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto, Ontario M5T 3M7, Canada
- Environmental Health Science and Research Bureau, Health Canada, 50 Colombine Driveway, Ottawa, Ontario K1A 0K9, Canada
- Public Health Ontario, 480 University Avenue, Toronto, Ontario M5G 1 V2, Canada
- ICES, 2075 Bayview Avenue, Toronto, Ontario M4N 3M5, Canada
| | - Marianne Hatzopoulou
- Department of Civil & Mineral Engineering, University of Toronto, 35 St George Street, Toronto, Ontario M5S 1A4, Canada
| |
Collapse
|
19
|
Venuta A, Lloyd M, Ganji A, Xu J, Simon L, Zhang M, Saeedi M, Yamanouchi S, Lavigne E, Hatzopoulou M, Weichenthal S. Predicting within-city spatiotemporal variations in daily median outdoor ultrafine particle number concentrations and size in Montreal and Toronto, Canada. Environ Epidemiol 2024; 8:e323. [PMID: 39045485 PMCID: PMC11265779 DOI: 10.1097/ee9.0000000000000323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 06/17/2024] [Indexed: 07/25/2024] Open
Abstract
Background Epidemiological evidence suggests that long-term exposure to outdoor ultrafine particles (UFPs, <0.1 μm) may have important human health impacts. However, less is known about the acute health impacts of these pollutants as few models are available to estimate daily within-city spatiotemporal variations in outdoor UFPs. Methods Several machine learning approaches (i.e., generalized additive models, random forest models, and extreme gradient boosting) were used to predict daily spatiotemporal variations in outdoor UFPs (number concentration and size) across Montreal and Toronto, Canada using a large database of mobile monitoring measurements. Separate models were developed for each city and all models were evaluated using a 10-fold cross-validation procedure. Results In total, our models were based on measurements from 12,705 road segments in Montreal and 10,929 road segments in Toronto. Daily median outdoor UFP number concentrations varied substantially across both cities with 1st-99th percentiles ranging from 1389 to 181,672 in Montreal and 2472 to 118,544 in Toronto. Outdoor UFP size tended to be smaller in Montreal (mean [SD]: 34 nm [15]) than in Toronto (mean [SD]: 44 nm [25]). Extreme gradient boosting models performed best and explained the majority of spatiotemporal variations in outdoor UFP number concentrations (Montreal, R 2: 0.727; Toronto, R 2: 0.723) and UFP size (Montreal, R 2: 0.823; Toronto, R 2: 0.898) with slopes close to one and intercepts close to zero for relationships between measured and predicted values. Conclusion These new models will be applied in future epidemiological studies examining the acute health impacts of outdoor UFPs in Canada's two largest cities.
Collapse
Affiliation(s)
- Alessya Venuta
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, Canada
| | - Marshall Lloyd
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, Canada
| | - Arman Ganji
- Department of Civil and Mineral Engineering, University of Toronto, Toronto, Canada
| | - Junshi Xu
- Department of Civil and Mineral Engineering, University of Toronto, Toronto, Canada
| | - Leora Simon
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, Canada
| | - Mingqian Zhang
- Department of Civil and Mineral Engineering, University of Toronto, Toronto, Canada
| | - Milad Saeedi
- Department of Civil and Mineral Engineering, University of Toronto, Toronto, Canada
| | - Shoma Yamanouchi
- Department of Civil and Mineral Engineering, University of Toronto, Toronto, Canada
| | - Eric Lavigne
- Environmental Health Science Research Bureau, Health Canada, Ottawa, Canada
| | - Marianne Hatzopoulou
- Department of Civil and Mineral Engineering, University of Toronto, Toronto, Canada
| | - Scott Weichenthal
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, Canada
- Air Health Science Division, Health Canada, Ottawa, Canada
| |
Collapse
|
20
|
Chen J, Zhu S, Wang P, Zheng Z, Shi S, Li X, Xu C, Yu K, Chen R, Kan H, Zhang H, Meng X. Predicting particulate matter, nitrogen dioxide, and ozone across Great Britain with high spatiotemporal resolution based on random forest models. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 926:171831. [PMID: 38521267 DOI: 10.1016/j.scitotenv.2024.171831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 03/13/2024] [Accepted: 03/18/2024] [Indexed: 03/25/2024]
Abstract
In Great Britain, limited studies have employed machine learning methods to predict air pollution especially ozone (O3) with high spatiotemporal resolution. This study aimed to address this gap by developing random forest models for four key pollutants (fine and inhalable particulate matter [PM2.5 and PM10], nitrogen dioxide [NO2] and O3) by integrating multiple-source predictors at a daily level and 1-km resolution. The out-of-bag R2 (root mean squared error, RMSE) between predictions from models and measurements from monitoring stations in 2006-2013 was 0.85 (3.63 μg/m3) for PM2.5, 0.77 (6.00 μg/m3) for PM10, 0.85 (9.71 μg/m3) for NO2, and 0.85 (9.39 μg/m3) for maximum daily 8-h average (MDA8) O3 at daily level, and the predicting accuracy was higher at monthly and annual level. The high-resolution predictions captured characterized spatiotemporal patterns of the four pollutants. Higher concentrations of PM2.5, PM10, and NO2 were distributed in densely populated southern regions of Great Britain while O3 showed an inverse spatial pattern in general, which could not be fully depicted by monitoring stations. Therefore, predictions produced in this study could improve exposure assessment with less exposure misclassification and flexible exposure windows for future epidemiological studies to investigate the impact of air pollution across Great Britain.
Collapse
Affiliation(s)
- Jiaxin Chen
- School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai, 200032, China
| | - Shengqiang Zhu
- Department of Environmental Science and Engineering, Fudan University, Shanghai, 200438, China
| | - Peng Wang
- Department of Atmospheric and Oceanic Sciences, Fudan University, Shanghai, 200438, China; Shanghai Key Laboratory of Meteorology and Health IRDR International Center of Excellence on Risk Interconnectivity and Governance on Weather/Climate Extremes Impact and Public Health WMO/IGAC MAP-AQ Asian Office Shanghai, Fudan University, Shanghai, China
| | - Zhonghua Zheng
- Department of Earth and Environmental Sciences, The University of Manchester, Manchester, UK
| | - Su Shi
- School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai, 200032, China
| | - Xinyue Li
- School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai, 200032, China
| | - Chang Xu
- School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai, 200032, China
| | - Kexin Yu
- School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai, 200032, China
| | - Renjie Chen
- School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai, 200032, China; Shanghai Key Laboratory of Meteorology and Health IRDR International Center of Excellence on Risk Interconnectivity and Governance on Weather/Climate Extremes Impact and Public Health WMO/IGAC MAP-AQ Asian Office Shanghai, Fudan University, Shanghai, China
| | - Haidong Kan
- School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai, 200032, China; Shanghai Key Laboratory of Meteorology and Health IRDR International Center of Excellence on Risk Interconnectivity and Governance on Weather/Climate Extremes Impact and Public Health WMO/IGAC MAP-AQ Asian Office Shanghai, Fudan University, Shanghai, China
| | - Hongliang Zhang
- Department of Environmental Science and Engineering, Fudan University, Shanghai, 200438, China; Shanghai Key Laboratory of Meteorology and Health IRDR International Center of Excellence on Risk Interconnectivity and Governance on Weather/Climate Extremes Impact and Public Health WMO/IGAC MAP-AQ Asian Office Shanghai, Fudan University, Shanghai, China.
| | - Xia Meng
- School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai, 200032, China; Shanghai Key Laboratory of Meteorology and Health IRDR International Center of Excellence on Risk Interconnectivity and Governance on Weather/Climate Extremes Impact and Public Health WMO/IGAC MAP-AQ Asian Office Shanghai, Fudan University, Shanghai, China.
| |
Collapse
|
21
|
Asri AK, Lee HY, Chen YL, Wong PY, Hsu CY, Chen PC, Lung SCC, Chen YC, Wu CD. A machine learning-based ensemble model for estimating diurnal variations of nitrogen oxide concentrations in Taiwan. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 916:170209. [PMID: 38278267 DOI: 10.1016/j.scitotenv.2024.170209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 01/02/2024] [Accepted: 01/14/2024] [Indexed: 01/28/2024]
Abstract
Air pollution is inextricable from human activity patterns. This is especially true for nitrogen oxide (NOx), a pollutant that exists naturally and also as a result of anthropogenic factors. Assessing exposure by considering diurnal variation is a challenge that has not been widely studied. Incorporating 27 years of data, we attempted to estimate diurnal variations in NOx across Taiwan. We developed a machine learning-based ensemble model that integrated hybrid kriging-LUR, machine-learning, and an ensemble learning approach. Hybrid kriging-LUR was performed to select the most influential predictors, and machine-learning algorithms were applied to improve model performance. The three best machine-learning algorithms were suited and reassessed to develop ensemble learning that was designed to improve model performance. Our ensemble model resulted in estimates of daytime, nighttime, and daily NOx with high explanatory powers (Adj-R2) of 0.93, 0.98, and 0.94, respectively. These explanatory powers increased from the initial model that used only hybrid kriging-LUR. Additionally, the results depicted the temporal variation of NOx, with concentrations higher during the daytime than the nighttime. Regarding spatial variation, the highest NOx concentrations were identified in northern and western Taiwan. Model evaluations confirmed the reliability of the models. This study could serve as a reference for regional planning supporting emission control for environmental and human health.
Collapse
Affiliation(s)
- Aji Kusumaning Asri
- Department of Geomatics, College of Engineering, National Cheng Kung University, Tainan, Taiwan.
| | - Hsiao-Yun Lee
- Department of Leisure Industry and Health Promotion, National Taipei University of Nursing and Health Sciences, Taipei, Taiwan.
| | - Yu-Ling Chen
- Department of Geomatics, College of Engineering, National Cheng Kung University, Tainan, Taiwan.
| | - Pei-Yi Wong
- Department of Environmental and Occupational Health, National Cheng Kung University, Tainan, Taiwan.
| | - Chin-Yu Hsu
- Department of Safety, Health and Environmental Engineering, Ming Chi University of Technology, Taiwan; Center for Environmental Sustainability and Human Health, Ming Chi University of Technology, Taiwan.
| | - Pau-Chung Chen
- National Institute of Environmental Health Sciences, National Health Research Institutes, Miaoli, Taiwan; Institute of Environmental and Occupational Health Sciences, National Taiwan University College of Public Health, Taipei, Taiwan; Department of Environmental and Occupational Medicine, National Taiwan University Hospital, Taipei, Taiwan; Department of Public Health, National Taiwan University College of Public Health, Taipei, Taiwan.
| | - Shih-Chun Candice Lung
- Research Center for Environmental Changes, Academia Sinica, Taipei, Taiwan; Department of Atmospheric Sciences, National Taiwan University, Taipei, Taiwan; Institute of Environmental Health, School of Public Health, National Taiwan University, Taipei, Taiwan.
| | - Yu-Cheng Chen
- National Institute of Environmental Health Sciences, National Health Research Institutes, Miaoli, Taiwan; Department of Occupational Safety and Health, China Medical University, Taichung, Taiwan.
| | - Chih-Da Wu
- Department of Geomatics, College of Engineering, National Cheng Kung University, Tainan, Taiwan; National Institute of Environmental Health Sciences, National Health Research Institutes, Miaoli, Taiwan; Innovation and Development Center of Sustainable Agriculture, National Chung Hsing University, Taichung City 402, Taiwan.
| |
Collapse
|
22
|
Ren X, Mi Z, Georgopoulos PG. Socioexposomics of COVID-19 across New Jersey: a comparison of geostatistical and machine learning approaches. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2024; 34:197-207. [PMID: 36725924 PMCID: PMC9889956 DOI: 10.1038/s41370-023-00518-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 12/29/2022] [Accepted: 01/06/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND Disparities in adverse COVID-19 health outcomes have been associated with multiple social and environmental stressors. However, research is needed to evaluate the consistency and efficiency of methods for studying these associations at local scales. OBJECTIVE To assess socioexposomic associations with COVID-19 outcomes across New Jersey and evaluate consistency of findings from multiple modeling approaches. METHODS We retrieved data for COVID-19 cases and deaths for the 565 municipalities of New Jersey up to the end of the first phase of the pandemic, and calculated mortality rates with and without long-term-care (LTC) facility deaths. We considered 84 spatially heterogeneous environmental, demographic and socioeconomic factors from publicly available databases, including air pollution, proximity to industrial sites/facilities, transportation-related noise, occupation and commuting, neighborhood and housing characteristics, age structure, racial/ethnic composition, poverty, etc. Six geostatistical models (Poisson/Negative-Binomial regression, Poison/Negative-Binomial mixed effect model, Poisson/Negative-Binomial Bersag-York-Mollie spatial model) and two Machine Learning (ML) methods (Random Forest, Extreme Gradient Boosting) were implemented to assess association patterns. The Shapley effects plot was established for explainable ML and change of support validation was introduced to compare performances of different approaches. RESULTS We found robust positive associations of COVID-19 mortality with historic exposures to NO2, population density, percentage of minority and below high school education, and other social and environmental factors. Exclusion of LTC deaths does not significantly affect correlations for most factors but findings can be substantially influenced by model structures and assumptions. The best performing geostatistical models involved flexible structures representing data variations. ML methods captured association patterns consistent with the best performing geostatistical models, and furthermore detected consistent nonlinear associations not captured by geostatistical models. SIGNIFICANCE The findings of this work improve the understanding of how social and environmental disparities impacted COVID-19 outcomes across New Jersey.
Collapse
Affiliation(s)
- Xiang Ren
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ, 08854, USA
- Department of Chemical and Biochemical Engineering, Rutgers University, Piscataway, NJ, 08854, USA
- Department of Environmental and Occupational Health and Justice, Rutgers School of Public Health, Piscataway, NJ, 08854, USA
| | - Zhongyuan Mi
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ, 08854, USA
- Department of Environmental Sciences, Rutgers University, New Brunswick, NJ, 08901, USA
| | - Panos G Georgopoulos
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ, 08854, USA.
- Department of Chemical and Biochemical Engineering, Rutgers University, Piscataway, NJ, 08854, USA.
- Department of Environmental and Occupational Health and Justice, Rutgers School of Public Health, Piscataway, NJ, 08854, USA.
- Department of Environmental Sciences, Rutgers University, New Brunswick, NJ, 08901, USA.
| |
Collapse
|
23
|
Hsu CY, Lee RQ, Wong PY, Candice Lung SC, Chen YC, Chen PC, Adamkiewicz G, Wu CD. Estimating morning and evening commute period O 3 concentration in Taiwan using a fine spatial-temporal resolution ensemble mixed spatial model with Geo-AI technology. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 351:119725. [PMID: 38064987 DOI: 10.1016/j.jenvman.2023.119725] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 11/05/2023] [Accepted: 11/25/2023] [Indexed: 01/14/2024]
Abstract
Elevated levels of ground-level ozone (O3) can have harmful effects on health. While previous studies have focused mainly on daily averages and daytime patterns, it's crucial to consider the effects of air pollution during daily commutes, as this can significantly contribute to overall exposure. This study is also the first to employ an ensemble mixed spatial model (EMSM) that integrates multiple machine learning algorithms and predictor variables selected using Shapley Additive exExplanations (SHAP) values to predict spatial-temporal fluctuations in O3 concentrations across the entire island of Taiwan. We utilized geospatial-artificial intelligence (Geo-AI), incorporating kriging, land use regression (LUR), machine learning (random forest (RF), categorical boosting (CatBoost), gradient boosting (GBM), extreme gradient boosting (XGBoost), and light gradient boosting (LightGBM)), and ensemble learning techniques to develop ensemble mixed spatial models (EMSMs) for morning and evening commute periods. The EMSMs were used to estimate long-term spatiotemporal variations of O3 levels, accounting for in-situ measurements, meteorological factors, geospatial predictors, and social and seasonal influences over a 26-year period. Compared to conventional LUR-based approaches, the EMSMs improved performance by 58% for both commute periods, with high explanatory power and an adjusted R2 of 0.91. Internal and external validation procedures and verification of O3 concentrations at the upper percentile ranges (in 1%, 5%, 10%, 15%, 20%, and 25%) and other conditions (including rain, no rain, weekday, weekend, festival, and no festival) have demonstrated that the models are stable and free from overfitting issues. Estimation maps were generated to examine changes in O3 levels before and during the implementation of COVID-19 restrictions. These findings provide accurate variations of O3 levels in commute period with high spatiotemporal resolution of daily and 50m * 50m grid, which can support control pollution efforts and aid in epidemiological studies.
Collapse
Affiliation(s)
- Chin-Yu Hsu
- Department of Safety, Health and Environmental Engineering, Ming Chi University of Technology, New Taipei, Taiwan; Center for Environmental Sustainability and Human Health, Ming Chi University of Technology, New Taipei, Taiwan
| | - Ruei-Qin Lee
- Department of Geomatics, National Cheng Kung University, Tainan, Taiwan
| | - Pei-Yi Wong
- Department of Environmental and Occupational Health, National Cheng Kung University, Tainan, Taiwan
| | - Shih-Chun Candice Lung
- Research Center for Environmental Changes, Academia Sinica, Taipei, Taiwan; Department of Atmospheric Sciences, National Taiwan University, Taipei, Taiwan
| | - Yu-Cheng Chen
- National Institute of Environmental Health Sciences, National Health Research Institutes, Miaoli, Taiwan
| | - Pau-Chung Chen
- National Institute of Environmental Health Sciences, National Health Research Institutes, Miaoli, Taiwan; Institute of Environmental and Occupational Health Sciences, National Taiwan University College of Public Health, Taipei, Taiwan; Department of Public Health, National Taiwan University College of Public Health, Taipei, Taiwan; Department of Environmental and Occupational Medicine, National Taiwan University Hospital and National Taiwan University College of Medicine, Taipei, Taiwan
| | - Gary Adamkiewicz
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Chih-Da Wu
- Department of Geomatics, National Cheng Kung University, Tainan, Taiwan; National Institute of Environmental Health Sciences, National Health Research Institutes, Miaoli, Taiwan; Innovation and Development Center of Sustainable Agriculture, National Chung Hsing University, Tainan, Taiwan.
| |
Collapse
|
24
|
Ma X, Zou B, Deng J, Gao J, Longley I, Xiao S, Guo B, Wu Y, Xu T, Xu X, Yang X, Wang X, Tan Z, Wang Y, Morawska L, Salmond J. A comprehensive review of the development of land use regression approaches for modeling spatiotemporal variations of ambient air pollution: A perspective from 2011 to 2023. ENVIRONMENT INTERNATIONAL 2024; 183:108430. [PMID: 38219544 DOI: 10.1016/j.envint.2024.108430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 11/26/2023] [Accepted: 01/04/2024] [Indexed: 01/16/2024]
Abstract
Land use regression (LUR) models are widely used in epidemiological and environmental studies to estimate humans' exposure to air pollution within urban areas. However, the early models, developed using linear regressions and data from fixed monitoring stations and passive sampling, were primarily designed to model traditional and criteria air pollutants and had limitations in capturing high-resolution spatiotemporal variations of air pollution. Over the past decade, there has been a notable development of multi-source observations from low-cost monitors, mobile monitoring, and satellites, in conjunction with the integration of advanced statistical methods and spatially and temporally dynamic predictors, which have facilitated significant expansion and advancement of LUR approaches. This paper reviews and synthesizes the recent advances in LUR approaches from the perspectives of the changes in air quality data acquisition, novel predictor variables, advances in model-developing approaches, improvements in validation methods, model transferability, and modeling software as reported in 155 LUR studies published between 2011 and 2023. We demonstrate that these developments have enabled LUR models to be developed for larger study areas and encompass a wider range of criteria and unregulated air pollutants. LUR models in the conventional spatial structure have been complemented by more complex spatiotemporal structures. Compared with linear models, advanced statistical methods yield better predictions when handling data with complex relationships and interactions. Finally, this study explores new developments, identifies potential pathways for further breakthroughs in LUR methodologies, and proposes future research directions. In this context, LUR approaches have the potential to make a significant contribution to future efforts to model the patterns of long- and short-term exposure of urban populations to air pollution.
Collapse
Affiliation(s)
- Xuying Ma
- College of Geomatics, Xi'an University of Science and Technology, Xi'an 710054, China; College of Safety Science and Engineering, Xi'an University of Science and Technology, Xi'an 710054, China; International Laboratory for Air Quality and Health, Queensland University of Technology, Brisbane, Queensland 4000, Australia.
| | - Bin Zou
- School of Geosciences and Info-Physics, Central South University, Changsha, Hunan 410083, China.
| | - Jun Deng
- College of Safety Science and Engineering, Xi'an University of Science and Technology, Xi'an 710054, China; Shaanxi Key Laboratory of Prevention and Control of Coal Fire, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Jay Gao
- School of Environment, Faculty of Science, University of Auckland, Auckland 1010, New Zealand
| | - Ian Longley
- National Institute of Water and Atmospheric Research, Auckland 1010, New Zealand
| | - Shun Xiao
- School of Geography and Tourism, Shaanxi Normal University, Xi'an 710119, China
| | - Bin Guo
- College of Geomatics, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Yarui Wu
- College of Geomatics, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Tingting Xu
- School of Software Engineering, Chongqing University of Post and Telecommunications, Chongqing 400065, China
| | - Xin Xu
- Xi'an Institute for Innovative Earth Environment Research, Xi'an 710061, China
| | - Xiaosha Yang
- Shandong Nova Fitness Co., Ltd., Baoji, Shaanxi 722404, China
| | - Xiaoqi Wang
- College of Geomatics, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Zelei Tan
- College of Geomatics, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Yifan Wang
- College of Geomatics, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Lidia Morawska
- International Laboratory for Air Quality and Health, Queensland University of Technology, Brisbane, Queensland 4000, Australia.
| | - Jennifer Salmond
- School of Environment, Faculty of Science, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
25
|
Song J, Li J, Zhao R, Chu X. Developing predictive models for surgical outcomes in patients with degenerative cervical myelopathy: a comparison of statistical and machine learning approaches. Spine J 2024; 24:57-67. [PMID: 37531977 DOI: 10.1016/j.spinee.2023.07.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 07/16/2023] [Accepted: 07/26/2023] [Indexed: 08/04/2023]
Abstract
BACKGROUND CONTEXT Machine learning (ML) is widely used to predict the prognosis of numerous diseases. PURPOSE This retrospective analysis aimed to develop a prognostic prediction model using ML algorithms and identify predictors associated with poor surgical outcomes in patients with degenerative cervical myelopathy (DCM). STUDY DESIGN A retrospective study. PATIENT SAMPLE A total of 406 symptomatic DCM patients who underwent surgical decompression were enrolled and analyzed from three independent medical centers. OUTCOME MEASURES We calculated the area under the curve (AUC), classification accuracy, sensitivity, and specificity of each model. METHODS The Japanese Orthopedic Association (JOA) score was obtained before and 1 year following decompression surgery, and patients were grouped into good and poor outcome groups based on a cut-off value of 60% based on a previous study. Two datasets were fused for training, 1 dataset was held out as an external validation set. Optimal feature-subset and hyperparameters for each model were adjusted based on a 2,000-resample bootstrap-based internal validation via exhaustive search and grid search. The performance of each model was then tested on the external validation set. RESULTS The Support Vector Machine (SVM) model showed the highest predictive accuracy compared to other methods, with an AUC of 0.82 and an accuracy of 75.7%. Age, sex, disease duration, and preoperative JOA score were identified as the most commonly selected features by both the ML and statistical models. Grid search optimization for hyperparameters successfully enhanced the predictive performance of each ML model, and the SVM model still had the best performance with an AUC of 0.93 and an accuracy of 86.4%. CONCLUSIONS Overall, the study demonstrated that ML classifiers such as SVM can effectively predict surgical outcomes for patients with DCM while identifying associated predictors in a multivariate manner.
Collapse
Affiliation(s)
- Jiajun Song
- Department of Orthopedic Surgery, Tianjin Medical University General Hospital, Tianjin 300052, China
| | - Jie Li
- Department of Minimally Invasive Spine Surgery, Tianjin Hospital, Tianjin 300211, China
| | - Rui Zhao
- Department of Orthopedic Surgery, Tianjin Medical University General Hospital, Tianjin 300052, China
| | - Xu Chu
- Department of Orthopedic Surgery, Honghui Hospital, Xi'an Jiaotong University, Xi'an 710054, China.
| |
Collapse
|
26
|
Zeng X, Zhan Y, Zhou W, Qiu Z, Wang T, Chen Q, Qu D, Huang Q, Cao J, Zhou N. The Influence of Airborne Particulate Matter on the Risk of Gestational Diabetes Mellitus: A Large Retrospective Study in Chongqing, China. TOXICS 2023; 12:19. [PMID: 38250975 PMCID: PMC10818620 DOI: 10.3390/toxics12010019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 12/17/2023] [Accepted: 12/21/2023] [Indexed: 01/23/2024]
Abstract
Emerging research findings suggest that airborne particulate matter might be a risk factor for gestational diabetes mellitus (GDM). However, the concentration-response relationships and the susceptible time windows for different types of particulate matter may vary. In this retrospective analysis, we employ a novel robust approach to assess the crucial time windows regarding the prevalence of GDM and to distinguish the susceptibility of three GDM subtypes to air pollution exposure. This study included 16,303 pregnant women who received routine antenatal care in 2018-2021 at the Maternal and Child Health Hospital in Chongqing, China. In total, 2482 women (15.2%) were diagnosed with GDM. We assessed the individual daily average exposure to air pollution, including PM2.5, PM10, O3, NO2, SO2, and CO based on the volunteers' addresses. We used high-accuracy gridded air pollution data generated by machine learning models to assess particulate matter per maternal exposure levels. We further analyzed the association of pre-pregnancy, early, and mid-pregnancy exposure to environmental pollutants using a generalized additive model (GAM) and distributed lag nonlinear models (DLNMs) to analyze the association between exposure at specific gestational weeks and the risk of GDM. We observed that, during the first trimester, per IQR increases for PM10 and PM2.5 exposure were associated with increased GDM risk (PM10: OR = 1.19, 95%CI: 1.07~1.33; PM2.5: OR = 1.32, 95%CI: 1.15~1.50) and isolated post-load hyperglycemia (GDM-IPH) risk (PM10: OR = 1.23, 95%CI: 1.09~1.39; PM2.5: OR = 1.38, 95%CI: 1.18~1.61). Second-trimester O3 exposure was positively correlated with the associated risk of GDM, while pre-pregnancy and first-trimester exposure was negatively associated with the risk of GDM-IPH. Exposure to SO2 in the second trimester was negatively associated with the risk of GDM-IPH. However, there were no observed associations between NO2 and CO exposure and the risk of GDM and its subgroups. Our results suggest that maternal exposure to particulate matter during early pregnancy and exposure to O3 in the second trimester might increase the risk of GDM, and GDM-IPH is the susceptible GDM subtype to airborne particulate matter exposure.
Collapse
Affiliation(s)
- Xiaoling Zeng
- Institute of Toxicology, Facutly of Military Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing 400038, China; (X.Z.); (T.W.); (Q.C.)
- School of Public Health, China Medical University, Shenyang 110122, China
| | - Yu Zhan
- Department of Environmental Science and Engineering, Sichuan University, Chengdu 610065, China; (Y.Z.); (Z.Q.)
| | - Wei Zhou
- Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children (Women and Children’s Hospital of Chongqing Medical University), Chongqing 401147, China; (W.Z.); (Q.H.)
| | - Zhimei Qiu
- Department of Environmental Science and Engineering, Sichuan University, Chengdu 610065, China; (Y.Z.); (Z.Q.)
| | - Tong Wang
- Institute of Toxicology, Facutly of Military Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing 400038, China; (X.Z.); (T.W.); (Q.C.)
| | - Qing Chen
- Institute of Toxicology, Facutly of Military Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing 400038, China; (X.Z.); (T.W.); (Q.C.)
| | - Dandan Qu
- Clinical Research Centre, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China;
- Chongqing Research Centre for Prevention & Control of Maternal and Child Diseases and Public Health, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
| | - Qiao Huang
- Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children (Women and Children’s Hospital of Chongqing Medical University), Chongqing 401147, China; (W.Z.); (Q.H.)
| | - Jia Cao
- Institute of Toxicology, Facutly of Military Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing 400038, China; (X.Z.); (T.W.); (Q.C.)
| | - Niya Zhou
- Clinical Research Centre, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China;
- Chongqing Research Centre for Prevention & Control of Maternal and Child Diseases and Public Health, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
| |
Collapse
|
27
|
Nelson D, Choi Y, Sadeghi B, Yeganeh AK, Ghahremanloo M, Park J. A comprehensive approach combining positive matrix factorization modeling, meteorology, and machine learning for source apportionment of surface ozone precursors: Underlying factors contributing to ozone formation in Houston, Texas. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 334:122223. [PMID: 37481031 DOI: 10.1016/j.envpol.2023.122223] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 07/13/2023] [Accepted: 07/17/2023] [Indexed: 07/24/2023]
Abstract
Ozone concentrations in Houston, Texas, are among the highest in the United States, posing significant risks to human health. This study aimed to evaluate the impact of various emissions sources and meteorological factors on ozone formation in Houston from 2017 to 2021 using a comprehensive PMF-SHAP approach. First, we distinguished the unique sources of VOCs in each area and identified differences in the local chemistry that affect ozone production. At the urban station, the primary sources were n_decane, biogenic/industrial/fuel evaporation, oil and gas flaring/production, industrial emissions/evaporation, and ethylene/propylene/aromatics. At the industrial site, the main sources were industrial emissions/evaporation, fuel evaporation, vehicle-related sources, oil and gas flaring/production, biogenic, aromatic, and ethylene and propylene. And then, we performed SHAP analysis to determine the importance and impact of each emissions factor and meteorological variables. Shortwave radiation (SHAP values are ∼5.74 and ∼6.3 for Milby Park and Lynchburg, respectively) and humidity (∼4.87 and ∼4.71, respectively) were the most important variables for both sites. For the urban station, the most important emissions sources were n_decane (∼2.96), industrial emissions/evaporation (∼1.89), and ethylene/propylene/aromatics (∼1.57), while for the industrial site, they were oil and gas flaring/production (∼1.38), ethylene/propylene (∼1.26), and industrial emissions/evaporation (∼0.95). NOx had a negative impact on ozone production at the urban station due to the NOx-rich chemical regime, whereas NOx had positive impacts at the industrial site. The study's findings suggest that the PMF-SHAP approach is efficient, inexpensive, and can be applied to other similar applications to identify factors contributing to ozone-exceedance events. The study's results can be used to develop more effective air quality management strategies for Houston and other cities with high levels of ozone.
Collapse
Affiliation(s)
- Delaney Nelson
- Department of Earth and Atmospheric Science, University of Houston, Texas, USA
| | - Yunsoo Choi
- Department of Earth and Atmospheric Science, University of Houston, Texas, USA.
| | - Bavand Sadeghi
- Air Resources Laboratory, National Oceanic and Atmospheric Administration, College Park, MD, 20740, USA; Cooperative Institute for Satellite Earth System Studies, University of Maryland, College Park, MD, 20740, USA
| | | | - Masoud Ghahremanloo
- Department of Earth and Atmospheric Science, University of Houston, Texas, USA
| | - Jincheol Park
- Department of Earth and Atmospheric Science, University of Houston, Texas, USA
| |
Collapse
|
28
|
Gong X, Liu L, Huang Y, Zou B, Sun Y, Luo L, Lin Y. A pruned feed-forward neural network (pruned-FNN) approach to measure air pollution exposure. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 195:1183. [PMID: 37695355 PMCID: PMC10829730 DOI: 10.1007/s10661-023-11814-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 08/30/2023] [Indexed: 09/12/2023]
Abstract
Environmental epidemiology studies require accurate estimations of exposure intensities to air pollution. The process from air pollutant emission to individual exposure is however complex and nonlinear, which poses significant modeling challenges. This study aims to develop an exposure assessment model that can strike a balance between accuracy, complexity, and usability. In this regard, neural networks offer one possible approach. This study employed a custom-designed pruned feed-forward neural network (pruned-FNN) approach to calculate the air pollution exposure index based on emission time and rates, terrain factors, meteorological conditions, and proximity measurements. The model's performance was evaluated by cross-validating the estimated exposure indexes with ground-based monitoring records. The pruned FNN can predict pollution exposure indexes (PEIs) that are highly and stably correlated with the monitored air pollutant concentrations (Spearman's rank correlation coefficients for tenfold cross-validation (mean ± standard deviation: 0.906 ± 0.028) and for random cross-validation (0.913 ± 0.024)). The predicted values are also close to the ground truth in most cases (95.5% of the predicted PEIs have relative errors smaller than 10%) when the training datasets are sufficiently large and well-covered. The pruned-FNN method can make accurate exposure estimations using a flexible number of variables and less extensive data in a less money/time-consuming manner. Compared to other exposure assessment models, the pruned FNN is an appropriate and effective approach for exposure assessment that covers a large geographic area over a long period of time.
Collapse
Affiliation(s)
- Xi Gong
- Department of Geography & Environmental Studies, UNM Center for the Advancement of Spatial Informatics Research and Education (ASPIRE), University of New Mexico, Albuquerque, NM, 87131, USA.
| | - Lin Liu
- Department of Computer Science, UNM Center for the Advancement of Spatial Informatics Research and Education (ASPIRE), University of New Mexico, Albuquerque, NM, 87131, USA
| | - Yanhong Huang
- Department of Geography & Environmental Studies, UNM Center for the Advancement of Spatial Informatics Research and Education (ASPIRE), University of New Mexico, Albuquerque, NM, 87131, USA
| | - Bin Zou
- School of Geosciences and Info-Physics, Central South University, Changsha, 410083, Hunan, China
| | - Yeran Sun
- Department of Geography, University of Lincoln, Brayford Pool, Lincoln, LN6 7TS, UK
| | - Li Luo
- Division of Epidemiology, Biostatistics, and Preventive Medicine, Department of Internal Medicine, University of New Mexico Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM, 87131, USA
| | - Yan Lin
- Department of Geography & Environmental Studies, UNM Center for the Advancement of Spatial Informatics Research and Education (ASPIRE), University of New Mexico, Albuquerque, NM, 87131, USA
| |
Collapse
|
29
|
Sarroeira R, Henriques J, Sousa AM, Ferreira da Silva C, Nunes N, Moro S, Botelho MDC. Monitoring Sensors for Urban Air Quality: The Case of the Municipality of Lisbon. SENSORS (BASEL, SWITZERLAND) 2023; 23:7702. [PMID: 37765759 PMCID: PMC10537901 DOI: 10.3390/s23187702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 08/30/2023] [Accepted: 09/01/2023] [Indexed: 09/29/2023]
Abstract
Air pollution is a global issue that impacts environmental inequalities, and air quality sensors can have a decisive role in city policymaking for future cities. Science and society are already aware that during the most challenging times of COVID-19, the levels of air pollution in cities decreased, especially during lockdowns, when road traffic was reduced. Several pollution parameters can be used to analyse cities' environmental challenges, and it is more pressing than ever to have city climate decisions supported by sensor data. We have applied a data science approach to understand the evolution of the levels of carbon monoxide, nitrogen dioxide, particulate matter 2.5, and particulate matter 10 between August 2021 and July 2022. The analysis of the air quality levels, captured for the first time via 80 monitoring stations distributed throughout the municipality of Lisbon, has allowed us to realize that nitrogen dioxide and particulate matter 10 exceed the levels that are recommended by the World Health Organization, thereby increasing the health risk for those who live and work in Lisbon. Supported by these findings, we propose a central role for air quality sensors for policymaking in future cities, taking as a case study the municipality of Lisbon, Portugal, which is among the European cities that recently proposed be climate-neutral and smart city by 2030.
Collapse
Affiliation(s)
- Rodrigo Sarroeira
- ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026 Lisboa, Portugal; (R.S.); (S.M.)
| | - João Henriques
- CIES, Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026 Lisboa, Portugal; (J.H.); (M.d.C.B.)
| | - Ana M. Sousa
- CERENA, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisboa, Portugal;
| | | | - Nuno Nunes
- CIES, Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026 Lisboa, Portugal; (J.H.); (M.d.C.B.)
| | - Sérgio Moro
- ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026 Lisboa, Portugal; (R.S.); (S.M.)
| | - Maria do Carmo Botelho
- CIES, Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026 Lisboa, Portugal; (J.H.); (M.d.C.B.)
| |
Collapse
|
30
|
Dai H, Huang G, Wang J, Zeng H. VAR-tree model based spatio-temporal characterization and prediction of O 3 concentration in China. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2023; 257:114960. [PMID: 37116452 DOI: 10.1016/j.ecoenv.2023.114960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 04/16/2023] [Accepted: 04/24/2023] [Indexed: 05/08/2023]
Abstract
Ozone (O3) pollution in the atmosphere is getting worse in many cities. In order to improve the accuracy of O3 prediction and obtain the spatial distribution of O3 concentration over a continuous period of time, this paper proposes a VAR-XGBoost model based on Vector autoregression (VAR), Kriging method and XGBoost (Extreme Gradient Boosting). China is used as an example and its spatial distribution of O3 is simulated. In this paper, the O3 concentration data of the monitoring sites in China are obtained, and then a spatial prediction method of O3 mass concentration based on the VAR-XGBoost model is established, and finnally its influencing factors are analyzed. This paper concludes that O3 features the highest correlation with PM2.5 and the lowest correlation with SO2. Among the measurement factors, wind speed and temperature are the most important factors affecting O3 pollution, which are positively correlated to O3 pollution. In addition, precipitation is negatively correlated with 8-hour ozone concentration. In this paper, the performance of the VAR-XGBoost model is evaluated based on the ten-fold cross-validation method of sample, site and time, and a comparison with the results of XGBoost, CatBoost (categorical boosting), ExtraTrees, GBDT (gradient boosting decision tree), AdaBoost (adaptive boosting), RF (random forest), Decision tree, and LightGBM (light gradient boosting machine) models is conducted. The result shows that the prediction accuracy of the VAR-XGBoost model is better than other models. The seasonal and annual average R2 reaches 0.94 (spring), 0.93 (summer), 0.92 (autumn), 0.93 (winter), and 0.95 (average from 2016 to 2021). The data show that the applicability of the VAR-XGBoost model in simulating the spatial distribution of O3 concentrations in China performs well. The spatial distribution of O3 concentrations in the Chinese region shows an obvious feature of high in the east and low in the west, and the spatial distribution is strongly influenced by topographical factors. The mean concentration is clearly low in winter and high in summer within a season. The results of this study can provide a scientific basis for the prevention and control of regional O3 pollution in China, and can also provide new ideas for the acquisition of data on the spatial distribution of O3 concentrations within cities.
Collapse
Affiliation(s)
- Hongbin Dai
- School of Management, Xi'an University of Architecture and Technology, Xi'an 710055, China
| | - Guangqiu Huang
- School of Management, Xi'an University of Architecture and Technology, Xi'an 710055, China
| | - Jingjing Wang
- College of Vocational and Technical Education, Guangxi Science&Technology of Normal University, Laibin 546199, China.
| | - Huibin Zeng
- School of Management, Xi'an University of Architecture and Technology, Xi'an 710055, China
| |
Collapse
|
31
|
Ghahremanloo M, Choi Y, Lops Y. Deep learning mapping of surface MDA8 ozone: The impact of predictor variables on ozone levels over the contiguous United States. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 326:121508. [PMID: 36967006 DOI: 10.1016/j.envpol.2023.121508] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 03/19/2023] [Accepted: 03/22/2023] [Indexed: 06/18/2023]
Abstract
The limited number of ozone monitoring stations imposes uncertainty in various applications, calling for accurate approaches to capturing ozone values in all regions, particularly those with no in-situ measurements. This study uses deep learning (DL) to accurately estimate daily maximum 8-hr average (MDA8) ozone and examines the spatial contribution of several factors on ozone levels over the contiguous U.S. (CONUS) in 2019. A comparison between in-situ observations and DL-estimated MDA8 ozone values shows a correlation coefficient (R) of 0.95, an index of agreement (IOA) of 0.97, and a mean absolute bias (MAB) of 2.79 ppb, highlighting the promising performance of the deep convolutional neural network (Deep-CNN) at estimating surface MDA8 ozone. Spatial cross-validation also confirms the high spatial accuracy of the model, which obtains an R of 0.91, and IOA of 0.96 and an MAB of 3.46 ppb when it is trained and tested on separate stations. To interpret the black-box nature of our DL model, we use Shapley additive explanations (SHAP) to generate a spatial feature contribution map (SFCM), the results of which confirm an advanced ability of Deep-CNN to capture the interactions between most predictor variables and ozone. For instance, the model shows that solar radiation (SRad) SFCM, with higher values, enhances the formation of ozone, particularly in the south and southwestern CONUS. As SRad triggers ozone precursors to produce ozone via photochemical reactions, it increases ozone concentrations. The model also shows that humidity, with its low values, increases ozone concentrations in the western mountainous regions. The negative correlation between humidity and ozone levels can be attributed to factors such as higher ozone decomposition resulting from increased levels of humidity and OH radicals. This study is the first to introduce the SFCM to investigate the spatial role of predictor variables on changes in estimated MDA8 ozone levels.
Collapse
Affiliation(s)
- Masoud Ghahremanloo
- Department of Earth and Atmospheric Sciences, University of Houston, Houston, TX, 77004, USA.
| | - Yunsoo Choi
- Department of Earth and Atmospheric Sciences, University of Houston, Houston, TX, 77004, USA.
| | - Yannic Lops
- Department of Earth and Atmospheric Sciences, University of Houston, Houston, TX, 77004, USA.
| |
Collapse
|
32
|
Wu Y, Grant S, Chen W, Szarka A. Refining acute human exposure assessment to pesticides in surface water: An integrated data-driven modeling approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 865:161190. [PMID: 36581287 DOI: 10.1016/j.scitotenv.2022.161190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 12/03/2022] [Accepted: 12/21/2022] [Indexed: 06/17/2023]
Abstract
The substantial spatial and temporal variability of pesticides has led to large uncertainties when determining their peak aqueous concentrations. There is however a lack of large-scale studies dealing with accurate determination of annual maximum daily concentration (AMDC) across the landscape and over time based on the publicly available monitoring data. We developed a novel data-driven approach that firstly used time series modeling to generate AMDCs for qualified water monitoring sites in the conterminous U.S. With feature variables such as pesticide use and land cover compiled into the dataset, machine learning models using eXtreme Gradient Boosting (XGBoost) and Random Forest Regressor (RF) were then developed to estimate AMDCs in surface waters across the U.S. Both models exhibited significant predictability, while a hybrid model consisting of the average predictions by XGBoost and RF model had the highest prediction accuracy (mean absolute error (MAE): 1.23; R2: 0.61). The analysis of permutation variable importance indicated that pesticide use and drainage area were the two most important drivers. Partial dependence analysis revealed that pesticide use, precipitation, cultivated crop land cover and solubility exhibited concentration-promoting effects, whereas drainage area and molecular weight had concentration-demoting effects. Soil adsorption coefficient (Koc) showed nonmonotonic effects. The hybrid model was used to predict and map AMDCs of four example pesticides, including 2,4-dichlorophenoxyacetic acid (2,4-D), atrazine, glyphosate and imidacloprid during 2016-2019 at national scale. The predictive capability was validated using independent monitoring datasets. The fully evaluated approach significantly reduced the uncertainties in modeling annual peak concentrations and served as a valuable solution for conducting geographically oriented, highly refined exposure assessments for pesticides.
Collapse
Affiliation(s)
- Yaoxing Wu
- Product Safety, Syngenta Crop Protection LLC, Greensboro, NC 27409, USA.
| | - Shanique Grant
- Product Safety, Syngenta Crop Protection LLC, Greensboro, NC 27409, USA
| | - Wenlin Chen
- Product Safety, Syngenta Crop Protection LLC, Greensboro, NC 27409, USA
| | - Arpad Szarka
- Product Safety, Syngenta Crop Protection LLC, Greensboro, NC 27409, USA
| |
Collapse
|
33
|
Yang H, Ma W, Liu T, Li W. Assessing farmland suitability for agricultural machinery in land consolidation schemes in hilly terrain in China: A machine learning approach. FRONTIERS IN PLANT SCIENCE 2023; 14:1084886. [PMID: 36950352 PMCID: PMC10025464 DOI: 10.3389/fpls.2023.1084886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Abstract
Identifying available farmland suitable for agricultural machinery is the most promising way of optimizing agricultural production and increasing agricultural mechanization. Farmland consolidation suitable for agricultural machinery (FCAM) is implemented as an effective tool for increasing sustainable production and mechanized agriculture. By using the machine learning approach, this study assesses the suitability of farmland for agricultural machinery in land consolidation schemes based on four parameters, i.e., natural resource endowment, accessibility of agricultural machinery, socioeconomic level, and ecological limitations. And based on "suitability" and "potential improvement in farmland productivity", we classified land into four zones: the priority consolidation zone, the moderate consolidation zone, the comprehensive consolidation zone, and the reserve consolidation zone. The results showed that most of the farmland (76.41%) was either basically or moderately suitable for FCAM. Although slope was often an indicator that land was suitable for agricultural machinery, other factors, such as the inferior accessibility of tractor roads, continuous depopulation, and ecological fragility, contributed greatly to reducing the overall suitability of land for FCAM. Moreover, it was estimated that the potential productivity of farmland would be increased by 720.8 kg/ha if FCAM were implemented. Four zones constituted a useful basis for determining the implementation sequence and differentiating strategies for FCAM schemes. Consequently, this zoning has been an effective solution for implementing FCAM schemes. However, the successful implementation of FCAM schemes, and the achievement a modern and sustainable agriculture system, will require some additional strategies, such as strengthening farmland ecosystem protection and promoting R&D into agricultural machinery suitable for hilly terrain, as well as more financial support.
Collapse
Affiliation(s)
- Heng Yang
- College of Engineering, China Agricultural University, Beijing, China
| | - Wenqiu Ma
- College of Engineering, China Agricultural University, Beijing, China
| | - Tongxin Liu
- College of Engineering, China Agricultural University, Beijing, China
| | - Wenqing Li
- Key Laboratory of Land Consolidation and Rehabilitation, Land Consolidation and Rehabilitation Center, Ministry of Natural Resources, Beijing, China
| |
Collapse
|
34
|
Parise O, Parise G, Vaidyanathan A, Occhipinti M, Gharaviri A, Tetta C, Bidar E, Maesen B, Maessen JG, La Meir M, Gelsomino S. Machine Learning to Identify Patients at Risk of Developing New-Onset Atrial Fibrillation after Coronary Artery Bypass. J Cardiovasc Dev Dis 2023; 10:jcdd10020082. [PMID: 36826578 PMCID: PMC9962068 DOI: 10.3390/jcdd10020082] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 01/18/2023] [Accepted: 02/10/2023] [Indexed: 02/17/2023] Open
Abstract
BACKGROUND This study aims to get an effective machine learning (ML) prediction model of new-onset postoperative atrial fibrillation (POAF) following coronary artery bypass grafting (CABG) and to highlight the most relevant clinical factors. METHODS Four ML algorithms were employed to analyze 394 patients undergoing CABG, and their performances were compared: Multivariate Adaptive Regression Spline, Neural Network, Random Forest, and Support Vector Machine. Each algorithm was applied to the training data set to choose the most important features and to build a predictive model. The better performance for each model was obtained by a hyperparameters search, and the Receiver Operating Characteristic Area Under the Curve metric was selected to choose the best model. The best instances of each model were fed with the test data set, and some metrics were generated to assess the performance of the models on the unseen data set. A traditional logistic regression was also performed to be compared with the machine learning models. RESULTS Random Forest model showed the best performance, and the top five predictive features included age, preoperative creatinine values, time of aortic cross-clamping, body surface area, and Logistic Euro-Score. CONCLUSIONS The use of ML for clinical predictions requires an accurate evaluation of the models and their hyperparameters. Random Forest outperformed all other models in the clinical prediction of POAF following CABG.
Collapse
Affiliation(s)
- Orlando Parise
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University Medical Centre, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
- Department of Cardiac Surgery, Universitair Ziekenhuis Brussel, Laarbeeklaan 101, 1090 Brussels, Belgium
- Correspondence:
| | - Gianmarco Parise
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University Medical Centre, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
| | | | | | - Ali Gharaviri
- Institute of Computational Science, Università della Svizzera Italiana, 6900 Lugano, Switzerland
| | - Cecilia Tetta
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University Medical Centre, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
| | - Elham Bidar
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University Medical Centre, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
| | - Bart Maesen
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University Medical Centre, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
| | - Jos G. Maessen
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University Medical Centre, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
| | - Mark La Meir
- Department of Cardiac Surgery, Universitair Ziekenhuis Brussel, Laarbeeklaan 101, 1090 Brussels, Belgium
| | - Sandro Gelsomino
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University Medical Centre, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
- Department of Cardiac Surgery, Universitair Ziekenhuis Brussel, Laarbeeklaan 101, 1090 Brussels, Belgium
| |
Collapse
|
35
|
Schumann A, Gaser C, Sabeghi R, Schulze PC, Festag S, Spreckelsen C, Bär KJ. Using machine learning to estimate the calendar age based on autonomic cardiovascular function. Front Aging Neurosci 2023; 14:899249. [PMID: 36755773 PMCID: PMC9899796 DOI: 10.3389/fnagi.2022.899249] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 12/20/2022] [Indexed: 01/24/2023] Open
Abstract
Introduction Aging is accompanied by physiological changes in cardiovascular regulation that can be evaluated using a variety of metrics. In this study, we employ machine learning on autonomic cardiovascular indices in order to estimate participants' age. Methods We analyzed a database including resting state electrocardiogram and continuous blood pressure recordings of healthy volunteers. A total of 884 data sets met the inclusion criteria. Data of 72 other participants with an BMI indicating obesity (>30 kg/m²) were withheld as an evaluation sample. For all participants, 29 different cardiovascular indices were calculated including heart rate variability, blood pressure variability, baroreflex function, pulse wave dynamics, and QT interval characteristics. Based on cardiovascular indices, sex and device, four different approaches were applied in order to estimate the calendar age of healthy subjects, i.e., relevance vector regression (RVR), Gaussian process regression (GPR), support vector regression (SVR), and linear regression (LR). To estimate age in the obese group, we drew normal-weight controls from the large sample to build a training set and a validation set that had an age distribution similar to the obesity test sample. Results In a five-fold cross validation scheme, we found the GPR model to be suited best to estimate calendar age, with a correlation of r=0.81 and a mean absolute error of MAE=5.6 years. In men, the error (MAE=5.4 years) seemed to be lower than that in women (MAE=6.0 years). In comparison to normal-weight subjects, GPR and SVR significantly overestimated the age of obese participants compared with controls. The highest age gap indicated advanced cardiovascular aging by 5.7 years in obese participants. Discussion In conclusion, machine learning can be used to estimate age on cardiovascular function in a healthy population when considering previous models of biological aging. The estimated age might serve as a comprehensive and readily interpretable marker of cardiovascular function. Whether it is a useful risk predictor should be investigated in future studies.
Collapse
Affiliation(s)
- Andy Schumann
- Lab for Autonomic Neuroscience, Imaging and Cognition (LANIC), Department of Psychosomatic Medicine and Psychotherapy, Jena University Hospital, Jena, Germany
| | - Christian Gaser
- Hans Berger Department of Neurology, Jena University Hospital, Jena, Germany
- Department of Psychiatry and Psychotherapy, Jena University Hospital, Jena, Germany
| | - Rassoul Sabeghi
- Lab for Autonomic Neuroscience, Imaging and Cognition (LANIC), Department of Psychosomatic Medicine and Psychotherapy, Jena University Hospital, Jena, Germany
| | - P. Christian Schulze
- Department of Internal Medicine I, Division of Cardiology, Jena University Hospital, Jena, Germany
| | - Sven Festag
- Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital, Jena, Germany
- SMITH Consortium of the German Medical Informatics Initiative, Leipzig, Germany
| | - Cord Spreckelsen
- Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital, Jena, Germany
- SMITH Consortium of the German Medical Informatics Initiative, Leipzig, Germany
| | - Karl-Jürgen Bär
- Lab for Autonomic Neuroscience, Imaging and Cognition (LANIC), Department of Psychosomatic Medicine and Psychotherapy, Jena University Hospital, Jena, Germany
| |
Collapse
|
36
|
Saha PK, Presto AA, Hankey S, Murphy BN, Allen C, Zhang W, Marshall JD, Robinson AL. National Exposure Models for Source-Specific Primary Particulate Matter Concentrations Using Aerosol Mass Spectrometry Data. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:14284-14295. [PMID: 36153982 PMCID: PMC11809489 DOI: 10.1021/acs.est.2c03398] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
This paper investigates the feasibility of developing national empirical models to predict ambient concentrations of sparsely monitored air pollutants at high spatial resolution. We used a data set of cooking organic aerosol (COA) and hydrocarbon-like organic aerosol (HOA; traffic primary organic PM) measured using aerosol mass spectrometry across the continental United States. The monitoring locations were selected to span the national distribution of land-use and source-activity variables commonly used for land-use regression modeling (e.g., road length, restaurant count, etc.). The models explain about 60% of the spatial variability of the measured data (R2 0.63 for the COA model and 0.62 for the HOA model). Extensive cross-validation suggests that the models are robust with reasonable transferability. The models predict large urban-rural and intra-urban variability with hotspots in urban areas and along the road corridors. The predicted national concentration surfaces show reasonable spatial correlation with source-specific national chemical transport model (CTM) simulations (R2: 0.45 for COA, 0.4 for HOA). Our measured data, empirical models, and CTM predictions all show that COA concentrations are about two times higher than HOA. Since COA and HOA are important contributors to the intra-urban spatial variability of the total PM2.5, our results highlight the potential importance of controlling commercial cooking emissions for air quality management in the United States.
Collapse
Affiliation(s)
- Provat K. Saha
- Center for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, USA
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, USA
| | - Albert A. Presto
- Center for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, USA
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, USA
| | - Steve Hankey
- School of Public and International Affairs, Virginia Tech, Blacksburg, Virginia, 24061, USA
| | - Benjamin N. Murphy
- Center for Environmental Measurement and Modeling, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27709, USA
| | - Chris Allen
- General Dynamics Information Technology, Research Triangle Park, North Carolina 27711, United States
| | - Wenwen Zhang
- Department of Public Informatics, Rutgers University, New Brunswick, NJ 08901
| | - Julian D. Marshall
- Department of Civil and Environmental Engineering, University of Washington, Seattle, Washington, 98195, USA
| | - Allen L. Robinson
- Center for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, USA
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, USA
| |
Collapse
|
37
|
De Marco A, Garcia-Gomez H, Collalti A, Khaniabadi YO, Feng Z, Proietti C, Sicard P, Vitale M, Anav A, Paoletti E. Ozone modelling and mapping for risk assessment: An overview of different approaches for human and ecosystems health. ENVIRONMENTAL RESEARCH 2022; 211:113048. [PMID: 35257686 DOI: 10.1016/j.envres.2022.113048] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 12/07/2021] [Accepted: 02/25/2022] [Indexed: 06/14/2023]
Abstract
Tropospheric ozone (O3) is one of the most concernedair pollutants dueto its widespread impacts on land vegetated ecosystems and human health. Ozone is also the third greenhouse gas for radiative forcing. Consequently, it should be carefully and continuously monitored to estimate its potential adverse impacts especially inthose regions where concentrations are high. Continuous large-scale O3 concentrations measurement is crucial but may be unfeasible because of economic and practical limitations; therefore, quantifying the real impact of O3over large areas is currently an open challenge. Thus, one of the final objectives of O3 modelling is to reproduce maps of continuous concentrations (both spatially and temporally) and risk assessment for human and ecosystem health. We here reviewedthe most relevant approaches used for O3 modelling and mapping starting from the simplest geo-statistical approaches andincreasing in complexity up to simulations embedded into the global/regional circulation models and pro and cons of each mode are highlighted. The analysis showed that a simpler approach (mostly statistical models) is suitable for mappingO3concentrationsat the local scale, where enough O3concentration data are available. The associated error in mapping can be reduced by using more complex methodologies, based on co-variables. The models available at the regional or global level are used depending on the needed resolution and the domain where they are applied to. Increasing the resolution corresponds to an increase in the prediction but only up to a certain limit. However, with any approach, the ensemble models should be preferred.
Collapse
Affiliation(s)
| | | | - Alessio Collalti
- Forest Modelling Lab., ISAFOM-CNR, Via Madonna Alta, Perugia, Italy
| | - Yusef Omidi Khaniabadi
- Department of Environmental Health Engineering, Industrial Medial and Health, Petroleum Industry Health Organization (PIHO), Ahvaz, Iran
| | - Zhaozhong Feng
- Key Laboratory of Agro-meteorology of Jiangsu Province, School of Applied Meteorology,Nanjing University of Information Science & Technology, Nanjing, 210044, China
| | | | | | - Marcello Vitale
- Sapienza University of Rome, Piazzale Aldo Moro, Rome, Italy
| | | | - Elena Paoletti
- IRET-CNR, Via Madonna Del Piano, Sesto Fiorentino, Florence, Italy
| |
Collapse
|
38
|
Lyu Y, Ju Q, Lv F, Feng J, Pang X, Li X. Spatiotemporal variations of air pollutants and ozone prediction using machine learning algorithms in the Beijing-Tianjin-Hebei region from 2014 to 2021. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2022; 306:119420. [PMID: 35526642 DOI: 10.1016/j.envpol.2022.119420] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2022] [Revised: 04/16/2022] [Accepted: 05/02/2022] [Indexed: 05/16/2023]
Abstract
China was seriously affected by air pollution in the past decade, especially for particulate matter (PM) and emerging ozone pollution recently. In this study, we systematically examined the spatiotemporal variations of six air pollutants and conducted ozone prediction using machine learning (ML) algorithms in the Beijing-Tianjin-Hebei (BTH) region. The annual-average concentrations of CO, PM10, PM2.5 and SO2 decreased at a rate of 141, 11.0, 6.6 and 5.6 μg/m3/year, while a pattern of initial increase and later decrease was observed for NO2 and O3_8 h. The concentration of SO2, CO and NO2 was higher in Tangshan and Xingtai, while northern BTH region has lower levels of CO, NO2 and PM. Spatial variations of ozone were relatively small in the BTH region. Monthly variations of PM10 displayed an increase in March probably due to wind-blown dusts from Northwest China. A seasonal and diurnal pattern with summer and afternoon peaks was found for ozone, which was contrast with other pollutants. Further ML algorithms such as Random Forest (RF) model and Decision tree (DT) regression showed good ozone prediction performance (daily: R2 = 0.83 and 0.73, RMSE = 30.0 and 37.3 μg/m3, respectively; monthly: R2 = 0.93 and 0.88, RMSE = 12.1 and 15.8 μg/m3, respectively) based on 10-fold cross-validation. Both RF model and DT regression relied more on the spatial trend as higher temporal prediction performance was achieved. Solar radiation- and temperature-related variables presented high importance at daily level, whereas sea level pressure dominated at monthly level. The spatiotemporal heterogeneity in variable importance was further confirmed using case studies based on RF model. In addition, variable importance was possibly influenced by the emission reductions due to COVID-19 pandemic. Despite its possible weakness to capture ozone extremes, RF model was beneficial and suggested for predicting spatiotemporal variations of ozone in future studies.
Collapse
Affiliation(s)
- Yan Lyu
- College of Environment, Zhejiang University of Technology, Hangzhou, 310032, China
| | - Qinru Ju
- School of Accounting, Southwestern University of Finance and Economics, Chengdu, 611130, China
| | - Fengmao Lv
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, China
| | - Jialiang Feng
- School of Environmental and Chemical Engineering, Shanghai University, Shanghai, 200444, China
| | - Xiaobing Pang
- College of Environment, Zhejiang University of Technology, Hangzhou, 310032, China.
| | - Xiang Li
- Department of Environmental Science & Engineering, Fudan University, Shanghai, 200438, China
| |
Collapse
|
39
|
Gauthier-Manuel H, Mauny F, Boilleaut M, Ristori M, Pujol S, Vasbien F, Parmentier AL, Bernard N. Improvement of downscaled ozone concentrations from the transnational scale to the kilometric scale: Need, interest and new insights. ENVIRONMENTAL RESEARCH 2022; 210:112947. [PMID: 35183519 DOI: 10.1016/j.envres.2022.112947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 02/04/2022] [Accepted: 02/09/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND Ground-level ozone is a major public health issue worldwide. An accurate assessment of ozone exposure is necessary. Modeling tools have been developed to tackle this issue in large areas. However, these models could present inaccuracies at the local scale. OBJECTIVES The objective of this study was i) to assess whether O3 concentrations estimated by transnational modeling at the kilometric scale (9 km2) could be improved, ii) to propose a potential correction of these downscaled ozone concentrations and iii) to evaluate the efficiency and applicability of such a correction. METHOD The present work was carried out in three phases. First, the performance of a transnational modeling platform (PREV'EST) was assessed at 6 geographic points by comparison with data from 6 air quality monitoring stations. Performance indicators were used for this purpose (MBE (mean bias error), MAE (mean absolute error), RMSE (root mean square error), r (Pearson correlation coefficient), and target plots). Second, several corrections were developed using MARS (multivariate adaptive regression splines) and integrating different sets of variables (mean temperature, relative humidity, rainfall amount, wind speed, elevation, and date). Their performance was evaluated. Third, external validation of the corrections was conducted using the data from six additional air quality monitoring stations. RESULTS The uncorrected PREV'EST model presented a lack of exactitude and precision. These concentrations did not reproduce the interday variability of the measurements, leading to a lack of temporal contrast in exposure data. For the best performance enhancement, the correction applied improved MBE, MAE, RMSE and r from 14.67, 19.23, 23.18 and 0.67 to 0.00, 8.00, 10.19 and 0.91, respectively. External validation confirmed the efficiency of the corrections at the regional scale. CONCLUSIONS We propose a validated and efficient methodology integrating local environmental variables. The methodology is adaptable according to the context, needs and data available.
Collapse
Affiliation(s)
- Honorine Gauthier-Manuel
- UMR 6249, Laboratoire Chrono-environnement, Université de Bourgogne Franche-Comté, CNRS, 25000, Besançon, France; Unité de Méthodologie en Recherche Clinique, épidémiologie et Santé Publique (uMETh), Inserm CIC 1431, CHU, 25030, Besançon Cedex, France.
| | - Frédéric Mauny
- UMR 6249, Laboratoire Chrono-environnement, Université de Bourgogne Franche-Comté, CNRS, 25000, Besançon, France; Unité de Méthodologie en Recherche Clinique, épidémiologie et Santé Publique (uMETh), Inserm CIC 1431, CHU, 25030, Besançon Cedex, France
| | | | - Marie Ristori
- ATMO Bourgogne-Franche-Comté, 25000, Besançon, France
| | - Sophie Pujol
- UMR 6249, Laboratoire Chrono-environnement, Université de Bourgogne Franche-Comté, CNRS, 25000, Besançon, France; Unité de Méthodologie en Recherche Clinique, épidémiologie et Santé Publique (uMETh), Inserm CIC 1431, CHU, 25030, Besançon Cedex, France
| | | | - Anne-Laure Parmentier
- UMR 6249, Laboratoire Chrono-environnement, Université de Bourgogne Franche-Comté, CNRS, 25000, Besançon, France; Unité de Méthodologie en Recherche Clinique, épidémiologie et Santé Publique (uMETh), Inserm CIC 1431, CHU, 25030, Besançon Cedex, France
| | - Nadine Bernard
- UMR 6249, Laboratoire Chrono-environnement, Université de Bourgogne Franche-Comté, CNRS, 25000, Besançon, France; Centre National de La Recherche Scientifique, UMR 6049, Laboratoire ThéMA, Université de Bourgogne Franche-Comté, 25000, Besançon, France
| |
Collapse
|
40
|
Xu J, Yang W, Bai Z, Zhang R, Zheng J, Wang M, Zhu T. Modeling spatial variation of gaseous air pollutants and particulate matters in a Metropolitan area using mobile monitoring data. ENVIRONMENTAL RESEARCH 2022; 210:112858. [PMID: 35149107 PMCID: PMC9203245 DOI: 10.1016/j.envres.2022.112858] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/04/2022] [Accepted: 01/26/2022] [Indexed: 06/14/2023]
Abstract
Geo-statistical models have been applied to assess fine-scale air pollution exposures in epidemiological studies. Many of the models were developed for criteria air pollutants rather than others that have not been regulated (e.g., ultrafine particles, black carbon, and benzene) which may also be harmful to human health. We aim to develop spatial models for regulated and non-regulated air pollutants using 6 algorithms and compare their prediction performances. A mobile platform with fast-response monitors was used to measure gaseous air pollutants (nitrogen dioxides, carbon monoxide, sulfur dioxides, ozone, benzene, toluene, methanol) and particulate matters (black carbon, surface area, count- and volume-concentrations of ultrafine particles) in Beijing, China for 30 days from July to October 2008. Mobile monitoring data for model building were spatially aggregated into 130 road segments of approximately 600-m interval on the sampling routes after temporal adjustment of background concentrations. The best models for the air pollutants were dominated by traffic variables, which explained more than 60% of the spatial variations (range: 0.61 for methanol to 0.88 for ozone) based on the highest cross-validation R2 and the lowest root mean square error among different algorithms. Amongst the 6 algorithms, the spatial models using partial least squares regression (PLS, a dimension reduction algorithm) and random forest (RF, a machine learning algorithm) algorithms outperformed the models with other algorithms. Exposure predictions from the best models varied substantially with distinct spatial patterns between the air pollutants. Predictions with multiple modeling algorithms were moderately correlated with each other for the same pollutant at the fine-scale grids across the city. Exposure models, especially based on PLS and RF algorithms, captured the spatial variation of short-term average concentrations, had adequate predictive validity, and could be applied to assess toxic air pollutant exposures in human health studies.
Collapse
Affiliation(s)
- Jia Xu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, China; Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, United States
| | - Wen Yang
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, China
| | - Zhipeng Bai
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, China
| | - Renyi Zhang
- Department of Atmospheric Sciences, Texas A&M University, Center for Atmospheric Chemistry and the Environment, College Station, TX, United States
| | - Jun Zheng
- Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing, China
| | - Meng Wang
- Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, United States; Department of Epidemiology and Environmental Health, University at Buffalo, Buffalo, NY, United States; RENEW Institute, University at Buffalo, Buffalo, NY, United States.
| | - Tong Zhu
- BIC-ESAT and SKL-ESPC, College of Environmental Sciences and Engineering, Peking University, Beijing, China.
| |
Collapse
|
41
|
Zhang R, Lai KY, Liu W, Liu Y, Lu J, Tian L, Webster C, Luo L, Sarkar C. Community-level ambient fine particulate matter and seasonal influenza among children in Guangzhou, China: A Bayesian spatiotemporal analysis. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 826:154135. [PMID: 35227720 DOI: 10.1016/j.scitotenv.2022.154135] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 02/21/2022] [Accepted: 02/21/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND Influenza is a major preventable infectious respiratory disease. However, there is little detailed long-term evidence of its associations with PM2.5 among children. We examined the community-level associations between exposure to ambient PM2.5 and incident influenza in Guangzhou, China. METHODS We used data from the city-wide influenza surveillance system collected by Guangzhou Centre for Disease Control and Prevention (GZCDC) over the period 2013 and 2019. Incident influenza was defined as daily new influenza (both clinically diagnosed and laboratory confirmed) cases as per standard diagnostic criteria. A 200-meter city-wide grid of daily ambient PM2.5 exposure was generated using a random forest model. We developed spatiotemporal Bayesian hierarchical models to examine the community-level associations between PM2.5 and the influenza adjusting for meteorological and socioeconomic variables and accounting for spatial autocorrelation. We also calculated community-wide influenza cases attributable to PM2.5 levels exceeding the China Grade 1 and World Health Organization (WHO) regulatory thresholds. RESULTS Our study comprised N = 191,846 children from Guangzhou aged ≤19 years and diagnosed with influenza between January 1, 2013 and December 31, 2019. Each 10 μg/m3 increment in community-level PM2.5 measured on the day of case confirmation (lag 0) and over a 6-day moving average (lag 0-5 days) was associated with higher risks of influenza (RR = 1.05, 95% CI: 1.05-1.06 for lag 0 and RR = 1.15, 95% CI: 1.14-1.16 for lag 05). We estimated that 8.10% (95%CI: 7.23%-8.57%) and 20.11% (95%CI: 17.64%-21.48%) influenza cases respectively were attributable to daily PM2.5 exposure exceeding the China Grade I (35 μg/m3) and the WHO limits (25 μg/m3). The risks associated with PM2.5 exposures were more pronounced among children of the age-group 10-14 compared to other age groups. CONCLUSIONS More targeted non-pharmaceutical interventions aimed at reducing PM2.5 exposures at home, school and during commutes among children may constitute additional influenza prevention and control polices.
Collapse
Affiliation(s)
- Rong Zhang
- Healthy High Density Cities Lab, HKUrbanLab, The University of Hong Kong, Knowles Building, Pokfulam Road, Pokfulam, Hong Kong, China
| | - Ka Yan Lai
- Healthy High Density Cities Lab, HKUrbanLab, The University of Hong Kong, Knowles Building, Pokfulam Road, Pokfulam, Hong Kong, China
| | - Wenhui Liu
- Guangzhou Center for Disease Control and Prevention, Guangzhou, Guangdong, China
| | - Yanhui Liu
- Guangzhou Center for Disease Control and Prevention, Guangzhou, Guangdong, China
| | - Jianyun Lu
- Guangzhou Center for Disease Control and Prevention, Guangzhou, Guangdong, China
| | - Linwei Tian
- School of Public Health, The University of Hong Kong, Patrick Mason Building, Sassoon Road, Pokfulam, Hong Kong, China
| | - Chris Webster
- Healthy High Density Cities Lab, HKUrbanLab, The University of Hong Kong, Knowles Building, Pokfulam Road, Pokfulam, Hong Kong, China
| | - Lei Luo
- Guangzhou Center for Disease Control and Prevention, Guangzhou, Guangdong, China.
| | - Chinmoy Sarkar
- Healthy High Density Cities Lab, HKUrbanLab, The University of Hong Kong, Knowles Building, Pokfulam Road, Pokfulam, Hong Kong, China.
| |
Collapse
|
42
|
Wang S, Mu X, Jiang P, Huo Y, Zhu L, Zhu Z, Wu Y. New Deep Learning Model to Estimate Ozone Concentrations Found Worrying Exposure Level over Eastern China. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:7186. [PMID: 35742435 PMCID: PMC9223487 DOI: 10.3390/ijerph19127186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 06/08/2022] [Accepted: 06/09/2022] [Indexed: 11/21/2022]
Abstract
Ozone (O3), whose concentrations have been increasing in eastern China recently, plays a key role in human health, biodiversity, and climate change. Accurate information about the spatiotemporal distribution of O3 is crucial for human exposure studies. We developed a deep learning model based on a long short-term memory (LSTM) network to estimate the daily maximum 8 h average (MDA8) O3 across eastern China in 2020. The proposed model combines LSTM with an attentional mechanism and residual connection structure. The model employed total O3 column product from the Tropospheric Monitoring Instrument, meteorological data, and other covariates as inputs. Then, the estimates from our model were compared with real observations of the China air quality monitoring network. The results indicated that our model performed better than other traditional models, such as the random forest model and deep neural network. The sample-based cross-validation R2 and RMSE of our model were 0.94 and 10.64 μg m−3, respectively. Based on the O3 distribution over eastern China derived from the model, we found that people in this region suffered from excessive O3 exposure. Approximately 81% of the population in eastern China was exposed to MDA8 O3 > 100 μg m−3 for more than 150 days in 2020.
Collapse
Affiliation(s)
- Sichen Wang
- School of Resources and Environmental Engineering, Anhui University, Hefei 230601, China; (S.W.); (L.Z.); (Z.Z.); (Y.W.)
| | - Xi Mu
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Hefei 230601, China;
| | - Peng Jiang
- School of Resources and Environmental Engineering, Anhui University, Hefei 230601, China; (S.W.); (L.Z.); (Z.Z.); (Y.W.)
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Hefei 230601, China;
- Anhui Province Engineering Laboratory for Mine Ecological Remediation, Anhui University, Hefei 230601, China
| | - Yanfeng Huo
- Anhui Institute of Meteorological Sciences, Hefei 230031, China;
| | - Li Zhu
- School of Resources and Environmental Engineering, Anhui University, Hefei 230601, China; (S.W.); (L.Z.); (Z.Z.); (Y.W.)
| | - Zhiqiang Zhu
- School of Resources and Environmental Engineering, Anhui University, Hefei 230601, China; (S.W.); (L.Z.); (Z.Z.); (Y.W.)
| | - Yanlan Wu
- School of Resources and Environmental Engineering, Anhui University, Hefei 230601, China; (S.W.); (L.Z.); (Z.Z.); (Y.W.)
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Hefei 230601, China;
| |
Collapse
|
43
|
Tian Y, deSouza P, Mora S, Yao X, Duarte F, Norford LK, Lin H, Ratti C. Evaluating the Meteorological Effects on the Urban Form-Air Quality Relationship Using Mobile Monitoring. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:7328-7336. [PMID: 35075907 DOI: 10.1021/acs.est.1c04854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Predictive models based on mobile measurements have been increasingly used to understand the spatiotemporal variations of intraurban air quality. However, the effects of meteorological factors, which significantly affect the dispersion of air pollution, on the urban-form-air-quality relationship have not been understood on a granular level. We attempt to fill this gap by developing predictive models of particulate matter (PM) in the Bronx (New York City) using meteorological and urban form parameters. The granular PM data was collected by mobile low-cost sensors as the ground truth. To evaluate the effects of meteorological factors, we compared the performance of models using the urban form within fixed and wind-sensitive buffers, respectively. We find better predictive power in the wind-sensitive group (R = 0.85) for NC10 (number concentration for particles with diameters of 1 μm-10 μm) than the control group (R = 0.01), and modest improvements for PM2.5 (R = 0.84 for the wind sensitive group, R = 0.77 for the control group), indicating that incorporating meteorological factors improved the predictive power of our models. We also found that urban form factors account for 62.95% of feature importance for NC10 and 14.90% for PM2.5 (9.99% and 4.91% for 3-D and 2-D urban form factors, respectively) in our Random Forest models. It suggests the importance of incorporating urban form factors, especially for the uncommonly used 3-D characteristics, in estimating intraurban PM. Our method can be applied in other cities to better capture the influence of urban context on PM levels.
Collapse
Affiliation(s)
- Ye Tian
- School of Geography and Environment, Jiangxi Normal University, Nanchang, 330022, China
- Senseable City Laboratory, Department of Urban Studies and Planning, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Geography, University of Georgia, Athens, Georgia 30602, United States
| | - Priyanka deSouza
- Department of Urban Studies and Planning, University of Colorado Denver, Denver, Colorado 80202, United States
| | - Simone Mora
- Senseable City Laboratory, Department of Urban Studies and Planning, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Xiaobai Yao
- Department of Geography, University of Georgia, Athens, Georgia 30602, United States
| | - Fabio Duarte
- Senseable City Laboratory, Department of Urban Studies and Planning, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Pontifícia Universidade Católica do Paraná, Curitiba, 80215 Brazil
| | - Leslie K Norford
- Department of Architecture, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Hui Lin
- School of Geography and Environment, Jiangxi Normal University, Nanchang, 330022, China
| | - Carlo Ratti
- Senseable City Laboratory, Department of Urban Studies and Planning, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
44
|
Wijnands JS, Nice KA, Seneviratne S, Thompson J, Stevenson M. The impact of the COVID-19 pandemic on air pollution: A global assessment using machine learning techniques. ATMOSPHERIC POLLUTION RESEARCH 2022; 13:101438. [PMID: 35506000 PMCID: PMC9047632 DOI: 10.1016/j.apr.2022.101438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 04/21/2022] [Accepted: 04/22/2022] [Indexed: 06/14/2023]
Abstract
In response to the COVID-19 pandemic, most countries implemented public health ordinances that resulted in restricted mobility and a resultant change in air quality. This has provided an opportunity to quantify the extent to which carbon-based transport and industrial activity affect air quality. However, quantification of these complex effects has proven to be difficult, depending on the stringency of restrictions, country-specific emission source profiles, long-term trends and meteorological effects on atmospheric chemistry, emission levels and in-flow from nearby countries. In this study, confounding factors were disentangled for a direct comparison of pandemic-related reductions in absolute pollutions levels, globally. The non-linear relationships between atmospheric processes and daily ground-level NO2 , PM10, PM2.5 and O3 measurements were captured in city- and pollutant-specific XGBoost models for over 700 cities, adjusting for weather, seasonality and trends. City-level modelling allowed adaptation to the distinct topography, urban morphology, climate and atmospheric conditions for each city, individually, as the weather variables that were most predictive varied across cities. Pollution forecasts for 2020 in absence of a pandemic were generated based on weather and formed an ensemble for country-level pollution reductions. Findings were robust to modelling assumptions and consistent with various published case studies. NO2 reduced most in China, Europe and India, following severe government restrictions as part of the initial lockdowns. Reductions were highly correlated with changes in mobility levels, especially trips to transit stations, workplaces, retail and recreation venues. Further, NO2 did not fully revert to pre-pandemic levels in 2020. Ambient PM2.5 pollution, which has severe adverse health consequences, reduced most in China and India. Since positive health effects could be offset to some extent by prolonged exposure to indoor pollution, alternative transport initiatives could prove to be an important pathway towards better health outcomes in these countries. Increased O3 levels during initial lockdowns have been documented widely. However, our analyses also found a subsequent reduction in O3 for many countries below what was expected based on meteorological conditions during summer months (e.g., China, United Kingdom, France, Germany, Poland, Turkey). The effects in periods with high O3 levels are especially important for the development of effective mitigation strategies to improve health outcomes.
Collapse
Affiliation(s)
- Jasper S Wijnands
- Transport, Health and Urban Design Research Lab, Melbourne School of Design, The University of Melbourne, Parkville VIC 3010, Australia
- Royal Netherlands Meteorological Institute (KNMI), 3731 GA De Bilt, The Netherlands
| | - Kerry A Nice
- Transport, Health and Urban Design Research Lab, Melbourne School of Design, The University of Melbourne, Parkville VIC 3010, Australia
| | - Sachith Seneviratne
- Transport, Health and Urban Design Research Lab, Melbourne School of Design, The University of Melbourne, Parkville VIC 3010, Australia
| | - Jason Thompson
- Transport, Health and Urban Design Research Lab, Melbourne School of Design, The University of Melbourne, Parkville VIC 3010, Australia
| | - Mark Stevenson
- Transport, Health and Urban Design Research Lab, Melbourne School of Design, The University of Melbourne, Parkville VIC 3010, Australia
- Faculty of Engineering and Information Technology, The University of Melbourne, Parkville VIC 3010, Australia
- Melbourne School of Population and Global Health, The University of Melbourne, Parkville VIC 3010, Australia
| |
Collapse
|
45
|
Meng X, Wang W, Shi S, Zhu S, Wang P, Chen R, Xiao Q, Xue T, Geng G, Zhang Q, Kan H, Zhang H. Evaluating the spatiotemporal ozone characteristics with high-resolution predictions in mainland China, 2013-2019. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2022; 299:118865. [PMID: 35063542 DOI: 10.1016/j.envpol.2022.118865] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 12/24/2021] [Accepted: 01/15/2022] [Indexed: 06/14/2023]
Abstract
Evaluating ozone levels at high resolutions and accuracy is crucial for understanding the spatiotemporal characteristics of ozone distribution and assessing ozone exposure levels in epidemiological studies. The national models with high spatiotemporal resolutions to predict ground ozone concentrations are limited in China so far. In this study, we aimed to develop a random forest model by combining ground ozone measurements from fixed stations, ozone simulations from the Community Multiscale Air Quality (CMAQ) modeling system, meteorological parameters, population density, road length, and elevation to predict ground maximum daily 8-h average (MDA8) ozone concentrations at a daily level and 1 km × 1 km spatial resolution. The model cross-validation R2 and root mean squared error (RMSE) were 0.80 and 20.93 μg/m3 at daily level in 2013-2019, respectively. CMAQ ozone simulations and near-surface temperature played vital roles in predicting ozone concentrations among all predictors. The population-weighted median concentrations of predicted MDA8 ozone were 89.34 μg/m3 in mainland China in 2013, and reached 100.96 μg/m3 in 2019. However, the long-term temporal variations among regions were heterogeneous. Central and Eastern China, as well as the Southeast Coastal Area, suffered higher ozone pollution and higher increased rates of ozone concentrations from 2013 to 2019. The seasonal pattern of ozone pollution varied spatially. The peak-season ozone pollution with the highest 6-month ozone concentrations occurred in different months among regions, with more than half domain in April-September. The predictions showed that not only the annual mean concentrations but also the percentages of grid-days with MDA8 ozone concentrations higher than 100/160 μg/m3 have been increasing in the past few years in China; meanwhile, majority areas in mainland China suffered peak-season ozone concentrations higher than the air quality guidelines launched by the World Health Organization in September 2021. The proposed model and ozone predictions with high spatiotemporal resolution and full coverage could provide health studies with flexible choices to evaluate ozone exposure levels at multiple spatiotemporal scales in the future.
Collapse
Affiliation(s)
- Xia Meng
- School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai, 200032, China
| | - Weidong Wang
- School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai, 200032, China
| | - Su Shi
- School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai, 200032, China
| | - Shengqiang Zhu
- Department of Environmental Science and Engineering, Fudan University, Shanghai, 200438, China
| | - Peng Wang
- Department of Atmospheric and Oceanic Sciences, Fudan University, Shanghai, 200438, China
| | - Renjie Chen
- School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai, 200032, China
| | - Qingyang Xiao
- State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing, 100084, China
| | - Tao Xue
- Institute of Reproductive and Child Health /Ministry of Health Key Laboratory of Reproductive Health and Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, 100191, China
| | - Guannan Geng
- State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing, 100084, China
| | - Qiang Zhang
- Ministry of Education Key Laboratory for Earth System Modeling, Department of Earth System Science, Tsinghua University, Beijing, 100084, China
| | - Haidong Kan
- School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai, 200032, China
| | - Hongliang Zhang
- Department of Environmental Science and Engineering, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
46
|
Ren X, Mi Z, Cai T, Nolte CG, Georgopoulos PG. Flexible Bayesian Ensemble Machine Learning Framework for Predicting Local Ozone Concentrations. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:3871-3883. [PMID: 35312316 PMCID: PMC9133919 DOI: 10.1021/acs.est.1c04076] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
3D-grid-based chemical transport models, such as the Community Multiscale Air Quality (CMAQ) modeling system, have been widely used for predicting concentrations of ambient air pollutants. However, typical horizontal resolutions of nationwide CMAQ simulations (12 × 12 km2) cannot capture local-scale gradients for accurately assessing human exposures and environmental justice disparities. In this study, a Bayesian ensemble machine learning (BEML) framework, which integrates 13 learning algorithms, was developed for downscaling CMAQ estimates of ozone daily maximum 8 h averages to the census tract level, across the contiguous US, and was demonstrated for 2011. Three-stage hyperparameter tuning and targeted validations were designed to ensure the ensemble model's ability to interpolate, extrapolate, and capture concentration peaks. The Shapley value metric from coalitional game theory was applied to interpret the drivers of subgrid gradients. The flexibility (transferability) of the 2011-trained BEML model was further tested by evaluating its ability to estimate fine-scale concentrations for other years (2012-2017) without retraining. To demonstrate the feasibility of using the BEML approach to strictly "data-limited" situations, the model was applied to downscale CMAQ outputs for a future-year scenario-based simulation that considers effects of variations in meteorology associated with climate change.
Collapse
Affiliation(s)
- Xiang Ren
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ 08854, USA
- Department of Chemical and Biochemical Engineering, Rutgers University, Piscataway, NJ 08854, USA
| | - Zhongyuan Mi
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ 08854, USA
- Department of Environmental Sciences, Rutgers University, New Brunswick, NJ 08901, USA
| | - Ting Cai
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ 08854, USA
| | - Christopher G. Nolte
- Center for Environmental Measurement and Modeling, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Panos G. Georgopoulos
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ 08854, USA
- Department of Chemical and Biochemical Engineering, Rutgers University, Piscataway, NJ 08854, USA
- Department of Environmental Sciences, Rutgers University, New Brunswick, NJ 08901, USA
- Department of Environmental and Occupational Health and Justice, Rutgers School of Public Health, Piscataway, NJ 08854, USA
| |
Collapse
|
47
|
A High-Performance Convolutional Neural Network for Ground-Level Ozone Estimation in Eastern China. REMOTE SENSING 2022. [DOI: 10.3390/rs14071640] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Having a high-quality historical air pollutant dataset is critical for environmental and epidemiological research. In this study, a novel deep learning model based on convolutional neural network architecture was developed to estimate ground-level ozone concentrations across eastern China. A high-resolution maximum daily average 8-hour (MDA8) surface ground ozone concentration dataset was generated with the support of the total ozone column from the satellite Tropospheric Monitoring Instrument, meteorological data from the China Meteorological Administration Land Data Assimilation System, and simulations of the WRF-Chem model. The modeled results were compared with in situ measurements in five cities that were not involved in model training, and the mean R2 of predicted ozone with observed values was 0.9, indicating the good robustness of our model. In addition, we compared the model results with some widely used machine learning techniques (e.g., random forest) and recently published ozone datasets, showing that the accuracy of our model is higher and that the spatial distributions of predicted ozone are more coherent. This study provides an efficient and exact method to estimate ground-level ozone and offers a new perspective for modeling spatiotemporal air pollutants.
Collapse
|
48
|
Zhang JJY, Sun L, Rainham D, Dummer TJB, Wheeler AJ, Anastasopolos A, Gibson M, Johnson M. Predicting intraurban airborne PM 1.0-trace elements in a port city: Land use regression by ordinary least squares and a machine learning algorithm. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 806:150149. [PMID: 34583078 DOI: 10.1016/j.scitotenv.2021.150149] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 08/31/2021] [Accepted: 09/01/2021] [Indexed: 06/13/2023]
Abstract
Airborne particulate matter (PM) has been associated with cardiovascular and respiratory morbidity and mortality, and there is some evidence that spatially varying metals found in PM may contribute to adverse health effects. We developed spatially refined models for PM trace elements using ordinary least squares land use regression (OLS-LUR) and machine leaning random forest land-use regression (RF-LUR). Two-week integrated measurements of PM1.0 (median aerodiameter < 1.0 μm) were collected at 50 sampling sites during fall (2010), winter (2011), and summer (2011) in the Halifax Regional Municipality, Nova Scotia, Canada. PM1.0 filters were analyzed for metals and trace elements using inductively coupled plasma-mass spectrometry. OLS- and RF-LUR models were developed for approximately 30 PM1.0 trace elements in each season. Model predictors included industrial, commercial, and institutional/ government/ military land use, roadways, shipping, other transportation sources, and wind rose information. RF generated more accurate models than OLS for most trace elements based on 5-fold cross validation. On average, summer models had the highest cross validation R2 (OLS-LUR = 0.40, RF-LUR = 0.46), while fall had the lowest (OLS-LUR = 0.27, RF-LUR = 0.31). Many OLS-LUR models displayed overprediction in the final exposure surface. In contrast, RF-LUR models did not exhibit overpredictions. Taking overpredictions and cross validation performances into account, OLS-LUR performed better than RF-LUR in roughly 20% of the seasonal trace element models. RF-LUR models provided more interpretable predictors in most cases. Seasonal predictors varied, likely due to differences in seasonal distribution of trace elements related to source activity, and meteorology.
Collapse
Affiliation(s)
- Joyce J Y Zhang
- Air Health Science Division, Health Canada, Ottawa, ON, Canada
| | - Liu Sun
- Air Health Science Division, Health Canada, Ottawa, ON, Canada
| | - Daniel Rainham
- Healthy Populations Institute and the School of Health and Human Performance, Dalhousie University, Halifax, NS, Canada
| | - Trevor J B Dummer
- School of Population and Public Health, University of British Columbia, Vancouver, BC, , Canada
| | - Amanda J Wheeler
- Mary MacKillop Institute for Health Research, Australian Catholic University, Melbourne, VIC, Australia; Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS, Australia
| | | | - Mark Gibson
- Division of Air Quality and Exposure Science, AirPhoton, Baltimore, MD, USA
| | - Markey Johnson
- Air Health Science Division, Health Canada, Ottawa, ON, Canada.
| |
Collapse
|
49
|
Gao W, Shu T, Guan Y, Ling S, Liu S, Zhou L. Novel strategy for establishment of an FT-Raman spectroscopy based quantitative model for poplar holocellulose content determination. Carbohydr Polym 2022; 277:118793. [PMID: 34893223 DOI: 10.1016/j.carbpol.2021.118793] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 10/13/2021] [Accepted: 10/17/2021] [Indexed: 01/07/2023]
Abstract
Raman spectroscopy is effective for studying the ultrastructure, lignin content, and cellulose crystallinity of lignocellulosic materials. However, the quantitative analysis of holocellulose in lignocellulosic materials by this technique is challenging. In this study, based on Fourier-transform Raman (FT-Raman) spectroscopy, a novel strategy for building poplar holocellulose content quantitative model was proposed. Different algorithms were applied, including Principal component regression (PCR), partial least square regression (PLSR), ridge regression (RR), lasso regression (LR), and elastic net regression (ENR). Combined with different algorithms, twelve candidates of internal standard were selected. Sixty models combined by five regression algorithms and twelve internal standards were performed by five-fold cross validation. Consequently, the models constructed through RR, LR, and ENR combined with the internal standard of peak intensity of 2945 cm-1 were credible (Rp > 0.9, RMSEp < 1.0, and MAEp < 0.9). Credible models were obtained, indicating the high potential of FT-Raman spectroscopy for predicting the holocellulose content of lignocellulosic materials.
Collapse
Affiliation(s)
- Wenli Gao
- School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, Anhui 230036, China; Key Lab of State Forest and Grassland Administration on Wood Quality Improvement & High Efficient Utilization, Hefei, Anhui 230036, China
| | - Ting Shu
- School of Physical Science and Technology, Shanghai Tech University, 393 Middle Huaxia Road, Shanghai 201210, China
| | - Ying Guan
- School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Shengjie Ling
- School of Physical Science and Technology, Shanghai Tech University, 393 Middle Huaxia Road, Shanghai 201210, China
| | - Shengquan Liu
- School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, Anhui 230036, China; Key Lab of State Forest and Grassland Administration on Wood Quality Improvement & High Efficient Utilization, Hefei, Anhui 230036, China.
| | - Liang Zhou
- School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, Anhui 230036, China; Key Lab of State Forest and Grassland Administration on Wood Quality Improvement & High Efficient Utilization, Hefei, Anhui 230036, China.
| |
Collapse
|
50
|
Huang C, Sun K, Hu J, Xue T, Xu H, Wang M. Estimating 2013-2019 NO 2 exposure with high spatiotemporal resolution in China using an ensemble model. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2022; 292:118285. [PMID: 34634409 PMCID: PMC8616822 DOI: 10.1016/j.envpol.2021.118285] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 09/29/2021] [Accepted: 10/03/2021] [Indexed: 05/30/2023]
Abstract
Air pollution has become a major issue in China, especially for traffic-related pollutants such as nitrogen dioxide (NO2). Current studies in China at the national scale were less focused on NO2 exposure and consequent health effects than fine particulate exposure, mainly due to a lack of high-quality exposure models for accurate NO2 predictions over a long period. We developed an advanced modeling framework that incorporated multisource, high-quality predictor data (e.g., satellite observations [Ozone Monitoring Instrument NO2, TROPOspheric Monitoring Instrument NO2, and Multi-Angle Implementation of Atmospheric Correction aerosol optical depth], chemical transport model simulations, high-resolution geographical variables) and three independent machine learning algorithms into an ensemble model. The model contains three stages: (1) filling missing satellite data; (2) building an ensemble model and predicting daily NO2 concentrations from 2013 to 2019 across China at 1×1 km2 resolution; (3) downscaling the predictions to finer resolution (100 m) at the urban scale. Our model achieves a high performance in terms of cross-validation to assess the agreement of the overall (R2 = 0.72) and the spatial (R2 = 0.85) variations of the NO2 predictions over the observations. The model performance remains moderately good when the predictions are extrapolated to the previous years without any monitoring data (CV R2 > 0.68) or regions far away from monitors (CV R2 > 0.63). We identified a clear decreasing trend of NO2 exposure from 2013 to 2019 across the country with the largest reduction in suburban and rural areas. Our downscaled model further improved the prediction ability by 4%-14% in some megacities and captured substantial NO2 variations within 1-km grids in the urban areas, especially near major roads. Our model provides flexibility at both temporal and spatial scales and can be applied to exposure assessment and epidemiological studies with various study domains (e.g., national or citywide) and settings (e.g., long-term and short-term).
Collapse
Affiliation(s)
- Conghong Huang
- Department of Epidemiology and Environmental Health, School of Public Health and Health Professions, University at Buffalo, Buffalo, NY, USA
| | - Kang Sun
- Department of Civil, Structural and Environmental Engineering, University at Buffalo, Buffalo, USA; Research and Education in Energy, Environment and Water Institute, University at Buffalo, Buffalo, NY, USA
| | - Jianlin Hu
- Jiangsu Key Laboratory of Atmospheric Environment Monitoring and Pollution Control, Jiangsu Engineering Technology Research Center of Environmental Cleaning Materials, Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, School of Environmental Science and Engineering, Nanjing University of Information Science and Technology, 219 Ningliu Road, Nanjing, 210044, China
| | - Tao Xue
- Institute of Reproductive and Child Health/Ministry of Health Key Laboratory of Reproductive Health and Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, 100191, China
| | - Hao Xu
- The Ministry of Education Key Laboratory for Earth System Modeling, Department of Earth System Science, Tsinghua University, Beijing, 100084, China
| | - Meng Wang
- Department of Epidemiology and Environmental Health, School of Public Health and Health Professions, University at Buffalo, Buffalo, NY, USA; Research and Education in Energy, Environment and Water Institute, University at Buffalo, Buffalo, NY, USA; Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, USA.
| |
Collapse
|