1
|
Garbagna L, Babu Saheer L, Maktab Dar Oghaz M. AI-driven approaches for air pollution modelling: A comprehensive systematic review. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2025; 373:125937. [PMID: 40058557 DOI: 10.1016/j.envpol.2025.125937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Revised: 02/04/2025] [Accepted: 02/25/2025] [Indexed: 03/28/2025]
Abstract
In recent years, air quality levels have become a global issue with the rise of harmful pollutants and their effects on climate change. Urban areas are especially affected by air pollution, resulting in a deterioration of the environment and a surge in health complications. Research has been conducted on different studies that accurately predict future pollution concentration levels utilising different methods. This paper introduces the current physical models for air quality prediction and conducts an extensive systematic literature review on Machine Learning and Deep Learning techniques for predicting pollutants. This work compares different methodologies and techniques by grouping studies that utilise similar approaches together and comparing them. Furthermore, a distinction is made between temporal and spatiotemporal models to understand and highlight how both approaches impact future air pollutant concentration level predictions. The review differs from similar works as it focuses not only on comparing models and approaches but by analysing how the usage of external features, such as meteorological data, traffic information, and land usage, affect pollutant levels and the model's accuracy on air quality forecasting. Performances and limitations are explored for both Machine and Deep Learning approaches, and the work offers a discussion on their comparison and possible future developments in this research space. This review highlights how Deep Learning models tend to be more suitable for forecasting problems due to their feature and spatio-temporal correlation representation abilities, as well as providing different directions for further work, from models utilisation to feature inclusion.
Collapse
Affiliation(s)
- Lorenzo Garbagna
- Anglia Ruskin University, East Road, Cambridge, CB1 1PT, Cambridgeshire, United Kingdom.
| | - Lakshmi Babu Saheer
- Anglia Ruskin University, East Road, Cambridge, CB1 1PT, Cambridgeshire, United Kingdom
| | | |
Collapse
|
2
|
Martenies SE, Oloo A, Magzamen S, Ji N, Khalili R, Kaur S, Xu Y, Yang T, Bastain TM, Breton CV, Farzan SF, Habre R, Dabelea D. Independent and joint effects of neighborhood-level environmental and socioeconomic exposures on body mass index in early childhood: The environmental influences on child health outcomes (ECHO) cohort. ENVIRONMENTAL RESEARCH 2024; 253:119109. [PMID: 38751004 DOI: 10.1016/j.envres.2024.119109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/19/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024]
Abstract
Past studies support the hypothesis that the prenatal period influences childhood growth. However, few studies explore the joint effects of exposures that occur simultaneously during pregnancy. To explore the feasibility of using mixtures methods with neighborhood-level environmental exposures, we assessed the effects of multiple prenatal exposures on body mass index (BMI) from birth to age 24 months. We used data from two cohorts: Healthy Start (n = 977) and Maternal and Developmental Risks from Environmental and Social Stressors (MADRES; n = 303). BMI was measured at delivery and 6, 12, and 24 months and standardized as z-scores. We included variables for air pollutants, built and natural environments, food access, and neighborhood socioeconomic status (SES). We used two complementary statistical approaches: single-exposure linear regression and quantile-based g-computation. Models were fit separately for each cohort and time point and were adjusted for relevant covariates. Single-exposure models identified negative associations between NO2 and distance to parks and positive associations between low neighborhood SES and BMI z-scores for Healthy Start participants; for MADRES participants, we observed negative associations between O3 and distance to parks and BMI z-scores. G-computations models produced comparable results for each cohort: higher exposures were generally associated with lower BMI, although results were not significant. Results from the g-computation models, which do not require a priori knowledge of the direction of associations, indicated that the direction of associations between mixture components and BMI varied by cohort and time point. Our study highlights challenges in assessing mixtures effects at the neighborhood level and in harmonizing exposure data across cohorts. For example, geospatial data of neighborhood-level exposures may not fully capture the qualities that might influence health behavior. Studies aiming to harmonize geospatial data from different geographical regions should consider contextual factors when operationalizing exposure variables.
Collapse
Affiliation(s)
- Sheena E Martenies
- Kinesiology and Community Health, University of Illinois Urbana-Champaign, Urbana, IL, USA; Division of Nutritional Sciences, University of Illinois Urbana-Champaign, Urbana, IL, USA; Family Resiliency Center, University of Illinois Urbana-Champaign, Urbana, IL, USA.
| | - Alice Oloo
- Kinesiology and Community Health, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - Sheryl Magzamen
- Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO, USA; Epidemiology, Colorado School of Public Health, Aurora, CO, USA
| | - Nan Ji
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Roxana Khalili
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Simrandeep Kaur
- Kinesiology and Community Health, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - Yan Xu
- Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
| | - Tingyu Yang
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Theresa M Bastain
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Carrie V Breton
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Shohreh F Farzan
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Rima Habre
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
| | - Dana Dabelea
- Epidemiology, Colorado School of Public Health, Aurora, CO, USA; Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Lifecourse Epidemiology of Adiposity and Diabetes (LEAD) Center, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
3
|
AlShehhi A, Welsch R. Artificial intelligence for improving Nitrogen Dioxide forecasting of Abu Dhabi environment agency ground-based stations. JOURNAL OF BIG DATA 2023; 10:92. [PMID: 37303479 PMCID: PMC10236404 DOI: 10.1186/s40537-023-00754-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 05/08/2023] [Indexed: 06/13/2023]
Abstract
Nitrogen Dioxide (NO2 ) is a common air pollutant associated with several adverse health problems such as pediatric asthma, cardiovascular mortality,and respiratory mortality. Due to the urgent society's need to reduce pollutant concentration, several scientific efforts have been allocated to understand pollutant patterns and predict pollutants' future concentrations using machine learning and deep learning techniques. The latter techniques have recently gained much attention due it's capability to tackle complex and challenging problems in computer vision, natural language processing, etc. In the NO2 context, there is still a research gap in adopting those advanced methods to predict the concentration of pollutants. This study fills in the gap by comparing the performance of several state-of-the-art artificial intelligence models that haven't been adopted in this context yet. The models were trained using time series cross-validation on a rolling base and tested across different periods using NO2 data from 20 monitoring ground-based stations collected by Environment Agency- Abu Dhabi, United Arab Emirates. Using the seasonal Mann-Kendall trend test and Sen's slope estimator, we further explored and investigated the pollutants trends across the different stations. This study is the first comprehensive study that reported the temporal characteristic of NO2 across seven environmental assessment points and compared the performance of the state-of-the-art deep learning models for predicting the pollutants' future concentration. Our results reveal a difference in the pollutants concentrations level due to the geographic location of the different stations, with a statistically significant decrease in the NO2 annual trend for the majority of the stations. Overall, NO2 concentrations exhibit a similar daily and weekly pattern across the different stations, with an increase in the pollutants level during the early morning and the first working day. Comparing the state-of-the-art model performance transformer model demonstrate the superiority of ( MAE:0.04 (± 0.04),MSE:0.06 (± 0.04), RMSE:0.001 (± 0.01), R2 : 0.98 (± 0.05)), compared with LSTM (MAE:0.26 (± 0.19), MSE:0.31 (± 0.21), RMSE:0.14 (± 0.17), R2 : 0.56 (± 0.33)), InceptionTime (MAE: 0.19 (± 0.18), MSE: 0.22 (± 0.18), RMSE:0.08 (± 0.13), R2 :0.38 (± 1.35) ), ResNet (MAE:0.24 (± 0.16), MSE:0.28 (± 0.16), RMSE:0.11 (± 0.12), R2 :0.35 (± 1.19) ), XceptionTime (MAE:0.7 (± 0.55), MSE:0.79 (± 0.54), RMSE:0.91 (± 1.06), R2 : - 4.83 (± 9.38) ), and MiniRocket (MAE:0.21 (± 0.07), MSE:0.26 (± 0.08), RMSE:0.07 (± 0.04), R2 : 0.65 (± 0.28) ) to tackle this challenge. The transformer model is a powerful model for improving the accurate forecast of the NO2 levels and could strengthen the current monitoring system to control and manage the air quality in the region. Supplementary Information The online version contains supplementary material available at 10.1186/s40537-023-00754-z.
Collapse
Affiliation(s)
- Aamna AlShehhi
- Biomedical Engineering, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Roy Welsch
- Sloan School of Management and Statistics, Massachusetts Institute of Technology, Cambridge, Massachusetts USA
| |
Collapse
|
4
|
Iskandaryan D, Ramos F, Trilles S. Comparison of Nitrogen Dioxide Predictions During a Pandemic and Non-pandemic Scenario in the City of Madrid using a Convolutional LSTM Network. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2022. [DOI: 10.1142/s1469026822500146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Traditionally, machine learning technologies with the methods and capabilities available, combined with a geospatial dimension, can perform predictive analyzes of air quality with greater accuracy. However, air pollution is influenced by many external factors, one of which has recently been caused by the restrictions applied to curb the relentless advance of COVID-19. These sudden changes in air quality levels can negatively influence current forecasting models. This work compares air pollution forecasts during a pandemic and non-pandemic period under the same conditions. The ConvLSTM algorithm was applied to predict the concentration of nitrogen dioxide using data from the air quality and meteorological stations in Madrid. The proposed model was applied for two scenarios: pandemic (January–June 2020) and non-pandemic (January–June 2019), each with sub-scenarios based on time granularity (1-h, 12-h, 24-h and 48-h) and combination of features. The Root Mean Square Error was taken as the estimation metric, and the results showed that the proposed method outperformed a reference model, and the feature selection technique significantly improved the overall accuracy.
Collapse
Affiliation(s)
- Ditsuhi Iskandaryan
- Institute of New Imaging Technologies, Universitat Jaume I, Avinguda de Vicent Sos Baynat, s/n Castelló de la Plana 12071, Spain
| | - Francisco Ramos
- Institute of New Imaging Technologies, Universitat Jaume I, Avinguda de Vicent Sos Baynat, s/n Castelló de la Plana 12071, Spain
| | - Sergio Trilles
- Institute of New Imaging Technologies, Universitat Jaume I, Avinguda de Vicent Sos Baynat, s/n Castelló de la Plana 12071, Spain
| |
Collapse
|
5
|
Iskandaryan D, Ramos F, Trilles S. Bidirectional convolutional LSTM for the prediction of nitrogen dioxide in the city of Madrid. PLoS One 2022; 17:e0269295. [PMID: 35648766 PMCID: PMC9159618 DOI: 10.1371/journal.pone.0269295] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 05/18/2022] [Indexed: 12/03/2022] Open
Abstract
Nitrogen dioxide is one of the pollutants with the most significant health effects. Advanced information on its concentration in the air can help to monitor and control further consequences more effectively, while also making it easier to apply preventive and mitigating measures. Machine learning technologies with available methods and capabilities, combined with the geospatial dimension, can perform predictive analyses with higher accuracy and, as a result, can serve as a supportive tool for productive management. One of the most advanced machine learning algorithms, Bidirectional convolutional LSTM, is being used in ongoing work to predict the concentration of nitrogen dioxide. The model has been validated to perform more accurate spatiotemporal analysis based on the integration of temporal and geospatial factors. The analysis was carried out according to two scenarios developed on the basis of selected features using data from the city of Madrid for the periods January-June 2019 and January-June 2020. Evaluation of the model's performance was conducted using the Root Mean Square Error and the Mean Absolute Error which emphasises the superiority of the proposed model over the reference models. In addition, the significance of a feature selection technique providing improved accuracy was underlined. In terms of execution time, due to the complexity of the Bidirectional convolutional LSTM architecture, convergence and generalisation of the data took longer, resulting in the superiority of the reference models.
Collapse
Affiliation(s)
- Ditsuhi Iskandaryan
- Institute of New Imaging Technologies (INIT), Universitat Jaume I, Castelló de la Plana, Castellón, Spain
| | - Francisco Ramos
- Institute of New Imaging Technologies (INIT), Universitat Jaume I, Castelló de la Plana, Castellón, Spain
| | - Sergio Trilles
- Institute of New Imaging Technologies (INIT), Universitat Jaume I, Castelló de la Plana, Castellón, Spain
| |
Collapse
|
6
|
Kianfar N, Mesgari MS, Mollalo A, Kaveh M. Spatio-temporal modeling of COVID-19 prevalence and mortality using artificial neural network algorithms. Spat Spatiotemporal Epidemiol 2022; 40:100471. [PMID: 35120681 PMCID: PMC8580864 DOI: 10.1016/j.sste.2021.100471] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 10/03/2021] [Accepted: 11/04/2021] [Indexed: 01/09/2023]
Abstract
The outbreak of coronavirus disease (COVID-19) has become one of the most challenging global concerns in recent years. Due to inadequate worldwide studies on spatio-temporal modeling of COVID-19, this research aims to examine the relative significance of potential explanatory variables (n = 75) concerning COVID-19 prevalence and mortality using multilayer perceptron artificial neural network topology. We utilized ten variable importance analysis methods to identify the relative importance of the explanatory variables. The main findings indicated that several variables were persistently among the most influential variables in all periods. Regarding COVID-19 prevalence, unemployment and population density were among the most influential variables with the highest importance scores. While for COVID-19 mortality, health-related variables such as diabetes prevalence and number of hospital beds were among the most significant variables. The obtained findings from this study might provide general insights for public health policymakers to monitor the spread of disease and support decision-making.
Collapse
Affiliation(s)
- Nima Kianfar
- Faculty of Geodesy and Geomatics, K. N. Toosi University of Technology, Tehran 19967-15433, Iran.
| | - Mohammad Saadi Mesgari
- Faculty of Geodesy and Geomatics, K. N. Toosi University of Technology, Tehran 19967-15433, Iran
| | - Abolfazl Mollalo
- Department of Public Health and Prevention Science, School of Health Sciences, Baldwin Wallace University, Berea, OH 44017, USA
| | - Mehrdad Kaveh
- Faculty of Geodesy and Geomatics, K. N. Toosi University of Technology, Tehran 19967-15433, Iran
| |
Collapse
|
7
|
Abstract
Air pollution and its consequences are negatively impacting on the world population and the environment, which converts the monitoring and forecasting air quality techniques as essential tools to combat this problem. To predict air quality with maximum accuracy, along with the implemented models and the quantity of the data, it is crucial also to consider the dataset types. This study selected a set of research works in the field of air quality prediction and is concentrated on the exploration of the datasets utilised in them. The most significant findings of this research work are: (1) meteorological datasets were used in 94.6% of the papers leaving behind the rest of the datasets with a big difference, which is complemented with others, such as temporal data, spatial data, and so on; (2) the usage of various datasets combinations has been commenced since 2009; and (3) the utilisation of open data have been started since 2012, 32.3% of the studies used open data, and 63.4% of the studies did not provide the data.
Collapse
|
8
|
Habre R, Girguis M, Urman R, Fruin S, Lurmann F, Shafer M, Gorski P, Franklin M, McConnell R, Avol E, Gilliland F. Contribution of tailpipe and non-tailpipe traffic sources to quasi-ultrafine, fine and coarse particulate matter in southern California. JOURNAL OF THE AIR & WASTE MANAGEMENT ASSOCIATION (1995) 2021; 71:209-230. [PMID: 32990509 PMCID: PMC8112073 DOI: 10.1080/10962247.2020.1826366] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/21/2020] [Accepted: 09/09/2020] [Indexed: 05/19/2023]
Abstract
Exposure to traffic-related air pollution (TRAP) in the near-roadway environment is associated with multiple adverse health effects. To characterize the relative contribution of tailpipe and non-tailpipe TRAP sources to particulate matter (PM) in the quasi-ultrafine (PM0.2), fine (PM2.5) and coarse (PM2.5-10) size fractions and identify their spatial determinants in southern California (CA). Month-long integrated PM0.2, PM2.5 and PM2.5-10 samples (n = 461, 265 and 298, respectively) were collected across cool and warm seasons in 8 southern CA communities (2008-9). Concentrations of PM mass, elements, carbons and major ions were obtained. Enrichment ratios (ER) in PM0.2 and PM10 relative to PM2.5 were calculated for each element. The Positive Matrix Factorization model was used to resolve and estimate the relative contribution of TRAP sources to PM in three size fractions. Generalized additive models (GAMs) with bivariate loess smooths were used to understand the geographic variation of TRAP sources and identify their spatial determinants. EC, OC, and B had the highest median ER in PM0.2 relative to PM2.5. Six, seven and five sources (with characteristic species) were resolved in PM0.2, PM2.5 and PM2.5-10, respectively. Combined tailpipe and non-tailpipe traffic sources contributed 66%, 32% and 18% of PM0.2, PM2.5 and PM2.5-10 mass, respectively. Tailpipe traffic emissions (EC, OC, B) were the largest contributor to PM0.2 mass (58%). Distinct gasoline and diesel tailpipe traffic sources were resolved in PM2.5. Others included fuel oil, biomass burning, secondary inorganic aerosol, sea salt, and crustal/soil. CALINE4 dispersion model nitrogen oxides, trucks and intersections were most correlated with TRAP sources. The influence of smaller roadways and intersections became more apparent once Long Beach was excluded. Non-tailpipe emissions constituted ~8%, 11% and 18% of PM0.2, PM2.5 and PM2.5-10, respectively, with important exposure and health implications. Future efforts should consider non-linear relationships amongst predictors when modeling exposures. Implications: Vehicle emissions result in a complex mix of air pollutants with both tailpipe and non-tailpipe components. As mobile source regulations lead to decreased tailpipe emissions, the relative contribution of non-tailpipe traffic emissions to near-roadway exposures is increasing. This study documents the presence of non-tailpipe abrasive vehicular emissions (AVE) from brake and tire wear, catalyst degradation and resuspended road dust in the quasi-ultrafine (PM0.2), fine and coarse particulate matter size fractions, with contributions reaching up to 30% in PM0.2 in some southern California communities. These findings have important exposure and policy implications given the high metal content of AVE and the efficiency of PM0.2 at reaching the alveolar region of the lungs and other organ systems once inhaled. This work also highlights important considerations for building models that can accurately predict tailpipe and non-tailpipe exposures for population health studies.
Collapse
Affiliation(s)
- Rima Habre
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | - Mariam Girguis
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | - Robert Urman
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | - Scott Fruin
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | | | - Martin Shafer
- Wisconsin State Laboratory of Hygiene, University of Wisconsin-Madison, Madison, WI
- Environmental Chemistry & Technology Program, University of Wisconsin-Madison, Madison WI
| | - Patrick Gorski
- Wisconsin State Laboratory of Hygiene, University of Wisconsin-Madison, Madison, WI
| | - Meredith Franklin
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | - Rob McConnell
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | - Ed Avol
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | - Frank Gilliland
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| |
Collapse
|
9
|
Li L, Girguis M, Lurmann F, Pavlovic N, McClure C, Franklin M, Wu J, Oman LD, Breton C, Gilliland F, Habre R. Ensemble-based deep learning for estimating PM 2.5 over California with multisource big data including wildfire smoke. ENVIRONMENT INTERNATIONAL 2020; 145:106143. [PMID: 32980736 PMCID: PMC7643812 DOI: 10.1016/j.envint.2020.106143] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/14/2020] [Accepted: 09/13/2020] [Indexed: 05/21/2023]
Abstract
INTRODUCTION Estimating PM2.5 concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic emissions, meteorology, topography (e.g. desert surfaces, mountains, snow cover) and land use. METHODS Using ensemble-based deep learning with big data fused from multiple sources we developed a PM2.5 prediction model with uncertainty estimates at a high spatial (1 km × 1 km) and temporal (weekly) resolution for a 10-year time span (2008-2017). We leveraged autoencoder-based full residual deep networks to model complex nonlinear interrelationships among PM2.5 emission, transport and dispersion factors and other influential features. These included remote sensing data (MAIAC aerosol optical depth (AOD), normalized difference vegetation index, impervious surface), MERRA-2 GMI Replay Simulation (M2GMI) output, wildfire smoke plume dispersion, meteorology, land cover, traffic, elevation, and spatiotemporal trends (geo-coordinates, temporal basis functions, time index). As one of the primary predictors of interest with substantial missing data in California related to bright surfaces, cloud cover and other known interferences, missing MAIAC AOD observations were imputed and adjusted for relative humidity and vertical distribution. Wildfire smoke contribution to PM2.5 was also calculated through HYSPLIT dispersion modeling of smoke emissions derived from MODIS fire radiative power using the Fire Energetics and Emissions Research version 1.0 model. RESULTS Ensemble deep learning to predict PM2.5 achieved an overall mean training RMSE of 1.54 μg/m3 (R2: 0.94) and test RMSE of 2.29 μg/m3 (R2: 0.87). The top predictors included M2GMI carbon monoxide mixing ratio in the bottom layer, temporal basis functions, spatial location, air temperature, MAIAC AOD, and PM2.5 sea salt mass concentration. In an independent test using three long-term AQS sites and one short-term non-AQS site, our model achieved a high correlation (>0.8) and a low RMSE (<3 μg/m3). Statewide predictions indicated that our model can capture the spatial distribution and temporal peaks in wildfire-related PM2.5. The coefficient of variation indicated highest uncertainty over deciduous and mixed forests and open water land covers. CONCLUSION Our method can be generalized to other regions, including those having a mix of major urban areas, deserts, intensive smoke events, snow cover and complex terrains, where PM2.5 has previously been challenging to predict. Prediction uncertainty estimates can also inform further model development and measurement error evaluations in exposure and health studies.
Collapse
Affiliation(s)
- Lianfa Li
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA; State Key Laboratory of Resources and Environmental Information System, Institute of Geographical Sciences and Natural Resources, Chinese Academy of Sciences, Beijing, China.
| | - Mariam Girguis
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | | | | | | | - Meredith Franklin
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Jun Wu
- Program in Public Health, Susan and Henry Samueli College of Health Sciences, University of California, Irvine, CA, USA
| | - Luke D Oman
- Goddard Space Flight Center, National Aeronautics and Space Administration, Greenbelt, MD, USA
| | - Carrie Breton
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Frank Gilliland
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Rima Habre
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
10
|
Ren X, Mi Z, Georgopoulos PG. Comparison of Machine Learning and Land Use Regression for fine scale spatiotemporal estimation of ambient air pollution: Modeling ozone concentrations across the contiguous United States. ENVIRONMENT INTERNATIONAL 2020; 142:105827. [PMID: 32593834 DOI: 10.1016/j.envint.2020.105827] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 04/29/2020] [Accepted: 05/19/2020] [Indexed: 06/11/2023]
Abstract
BACKGROUND Spatial linear Land-Use Regression (LUR) is commonly used for long-term modeling of air pollution in support of exposure and epidemiological assessments. Machine Learning (ML) methods in conjunction with spatiotemporal modeling can provide more flexible exposure-relevant metrics and have been studied using different model structures. There is however a lack of comparisons of methods available within these two modeling frameworks, that can guide model/algorithm selection in air quality epidemiology. OBJECTIVE The present study compares thirteen algorithms for spatial/spatiotemporal modeling applied for daily maxima of 8-hour running averages of ambient ozone concentrations at spatial resolutions corresponding to census tracts, to support estimation of annual ozone design values across the contiguous US. These algorithms were selected from nine representative categories and trained using predictors that included chemistry-transport model predictions, meteorological factors, land use and land cover, and stationary and mobile emissions. METHODS To obtain the best predictive performance, model structures were optimized through a repeated coarse/fine grid search with expert knowledge. Six target-oriented validation strategies were used to prevent overfitting and avoid over-optimistic model evaluation results. In order to take full advantage of the power of different algorithms, we introduced tuning sample weights in spatiotemporal modeling to ensure predictive accuracy of peak concentrations, that is crucial for exposure assessments. In spatial modeling, four interpretation and visualization tools were introduced to explain predictions from different algorithms. RESULTS Nonlinear ML methods achieved higher prediction accuracy than linear LUR, and the improvements were more significant for spatiotemporal modeling (nearly 10%-40% decrease of predicted RMSE). By tuning the sample weights, spatiotemporal models can predict concentrations used to calculate ozone design values that are comparable or even better than spatial models (nearly 30% decrease of cross-validated RMSE). We visualized the underlying nonlinear relationships, heterogeneous associations and complex interactions from the two best performing ML algorithms, i.e., Random Forest and Extreme Gradient Boosting, and found that the complex patterns were relatively less significant with respect to model accuracy for spatial modeling. CONCLUSION Machine Learning can provide estimates that are actually more interpretable and practical than linear regression to improve accuracy in modeling human exposures. A careful design of hyperparameter tuning and flexible data splitting and validations is crucial to obtain reliable and stable results. Desirable/successful nonlinear models are expected to capture similar nonlinear patterns and interactions using different ML algorithms.
Collapse
Affiliation(s)
- Xiang Ren
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ 08854, USA; Department of Chemical and Biochemical Engineering, Rutgers University, Piscataway, NJ 08854, USA
| | - Zhongyuan Mi
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ 08854, USA; Department of Environmental Sciences, Rutgers University, New Brunswick, NJ 08901, USA
| | - Panos G Georgopoulos
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ 08854, USA; Department of Chemical and Biochemical Engineering, Rutgers University, Piscataway, NJ 08854, USA; Department of Environmental Sciences, Rutgers University, New Brunswick, NJ 08901, USA; Department of Environmental and Occupational Health, Rutgers School of Public Health, Piscataway, NJ 08854, USA.
| |
Collapse
|
11
|
Herting MM, Younan D, Campbell CE, Chen JC. Outdoor Air Pollution and Brain Structure and Function From Across Childhood to Young Adulthood: A Methodological Review of Brain MRI Studies. Front Public Health 2019; 7:332. [PMID: 31867298 PMCID: PMC6908886 DOI: 10.3389/fpubh.2019.00332] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 10/25/2019] [Indexed: 12/19/2022] Open
Abstract
Outdoor air pollution has been recognized as a novel environmental neurotoxin. Studies have begun to use brain Magnetic Resonance Imaging (MRI) to investigate how air pollution may adversely impact developing brains. A systematic review was conducted to evaluate and synthesize the reported evidence from MRI studies on how early-life exposure to outdoor air pollution affects neurodevelopment. Using PubMed and Web of Knowledge, we conducted a systematic search, followed by structural review of original articles with individual-level exposure data and that met other inclusion criteria. Six studies were identified, each sampled from 3 cohorts of children in Spain, The Netherlands, and the United States. All studies included a one-time assessment of brain MRI when children were 6–12 years old. Air pollutants from traffic and/or regional sources, including polycyclic aromatic hydrocarbons (PAHs), nitrogen dioxide, elemental carbon, particulate matter (<2.5 or <10 μm), and copper, were estimated prenatally (n = 1), during childhood (n = 3), or both (n = 2), using personal monitoring and urinary biomarkers (n = 1), air sampling at schools (n = 4), or a land-use regression (LUR) modeling based on residences (n = 2). Associations between exposure and brain were noted, including: smaller white matter surface area (n = 1) and microstructure (n = 1); region-specific patterns of cortical thinness (n = 1) and smaller volumes and/or less density within the caudate (n = 3); altered resting-state functional connectivity (n = 2) and brain activity to sensory stimuli (n = 1). Preliminary findings suggest that outdoor air pollutants may impact MRI brain structure and function, but limitations highlight that the design of future air pollution-neuroimaging studies needs to incorporate a developmental neurosciences perspective, considering the exposure timing, age of study population, and the most appropriate neurodevelopmental milestones.
Collapse
Affiliation(s)
- Megan M Herting
- Department of Preventive Medicine, Keck School of Medicine of University of Southern California, Los Angeles, CA, United States.,Department of Pediatrics, Children's Hospital Los Angeles, Los Angeles, CA, United States
| | - Diana Younan
- Department of Preventive Medicine, Keck School of Medicine of University of Southern California, Los Angeles, CA, United States
| | - Claire E Campbell
- Department of Preventive Medicine, Keck School of Medicine of University of Southern California, Los Angeles, CA, United States
| | - Jiu-Chiuan Chen
- Department of Preventive Medicine, Keck School of Medicine of University of Southern California, Los Angeles, CA, United States.,Department of Neurology, Keck School of Medicine of University of Southern California, Los Angeles, CA, United States
| |
Collapse
|
12
|
Liu X, Ye Y, Chen Y, Li X, Feng B, Cao G, Xiao J, Zeng W, Li X, Sun J, Ning D, Yang Y, Yao Z, Guo Y, Wang Q, Zhang Y, Ma W, Du Q, Zhang B, Liu T. Effects of prenatal exposure to air particulate matter on the risk of preterm birth and roles of maternal and cord blood LINE-1 methylation: A birth cohort study in Guangzhou, China. ENVIRONMENT INTERNATIONAL 2019; 133:105177. [PMID: 31622906 DOI: 10.1016/j.envint.2019.105177] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 08/10/2019] [Accepted: 09/09/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND Epidemiological studies have found that increased risk of preterm birth (PTB) is associated with higher prenatal exposure to PM10 and PM2.5, but few studies have been conducted to assess the impacts of extremely fine particulate matter (PM1) which may have more toxic effects than other types of ambient particulate air pollution (PM). Several studies have separately investigated the associations between DNA methylation and PTB risk and PM. Maternal LINE-1 methylation level negatively correlated with prenatal exposure to PM and risk of PTB. A comprehensive picture is lacking regarding the associations between prenatal exposure to PM, LINE-1 methylation, and risk of PTB. OBJECTIVES This study aimed to estimate the effects of exposure to ambient PM (PM10, PM2.5, and PM1) of different sizes during pregnancy on risk of PTB, identify susceptible exposure windows, and illustrate the roles of LINE-1 methylation in the associations between PM and PTB risk. METHODS The Birth Cohort Study on Prenatal Environments and Offspring Health (PEOH) has been ongoing since 2016 in Guangzhou, China. A total of 4928 pregnant women were recruited during early pregnancy, and 4278 (86.8%) were successfully followed-up. Each individual weekly exposure to PM10 and PM2.5 from 3 months before pregnancy to childbirth was assessed using a spatiotemporal land use regression model, and the weekly PM1 exposure was estimated by employing a generalized additive model. Maternal and cord blood LINE-1 methylation levels (%5mC) were tested using bisulfite-PCR pyrosequencing. A distributed lag nonlinear model incorporated with a Cox proportional hazard model was applied to assess the effect of weekly-specific maternal PM exposure on PTB risk, and a multiple-linear regression model was employed to investigate the associations between PM exposure and LINE-1 methylation levels of maternal and cord bloods. We also assessed the associations between LINE-1 methylation levels and PTB risk by using a logistic regression model. RESULTS The risk of PTB was positively associated with PM2.5 and PM1 concentrations during the 12th to 20th gestational weeks, and the strongest association was in the fourth quartile (Q4) versus the first quartile (Q1) and observed during the 16th gestational week (PM2.5: harzard ratio [HR] = 1.18, 95%CI: 1.04-1.35, IQR = 11.94 μg/m3. PM1: HR = 1.20, 95%CI: 1.03-1.39, IQR = 11.36 μg/m3). We observed significantly negative associations of PM10(β = -0.51%5mC per 10 μg/m3, P = 0.014), PM2.5 (β = -0.66%5mC per 10 μg/m3, P = 0.032) and PM1 (β = -0.67%5mC per 10 μg/m3, P = 0.032) concentrations with cord blood LINE-1 methylation levels, and a negative association between PM1 concentration and maternal LINE-1 methylation level (β = -0.86%5mC per 10 μg/m3, P = 0.034). CONCLUSION Higher prenatal exposure to PM1 and PM2.5 during the 12th to 20th gestational weeks was associated with increased risk of PTB. Maternal and fetal LINE-1 methylation alternation might be an underlying mechanism of PM that increasing the risk of PTB.
Collapse
Affiliation(s)
- Xin Liu
- Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou 511430, China
| | - Yufeng Ye
- Guangzhou Panyu Central Hospital, Guangzhou 511400, China
| | - Yi Chen
- Guangzhou Panyu Central Hospital, Guangzhou 511400, China
| | - Xiaona Li
- Department of Environmental and Occupational Health, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China
| | - Baixiang Feng
- Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou 511430, China
| | - Ganxiang Cao
- Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou 511430, China
| | - Jianpeng Xiao
- Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou 511430, China
| | - Weilin Zeng
- Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou 511430, China
| | - Xing Li
- Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou 511430, China
| | - Jiufeng Sun
- Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou 511430, China
| | - Dan Ning
- Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou 511430, China
| | - Yi Yang
- School of Public Health, Guangdong Pharmaceutical University, Guangzhou 510080, China
| | - Zhenjiang Yao
- School of Public Health, Guangdong Pharmaceutical University, Guangzhou 510080, China
| | - Yuming Guo
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
| | - Qiong Wang
- School of Public Health, Sun Yat-sen University, Guangzhou 510080, China
| | - Yonghui Zhang
- Guangdong Provincial Center for Disease Control and Prevention, Guangzhou 511430, China
| | - Wenjun Ma
- Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou 511430, China; General Practice Center, Nanhai Hospital, Southern Medical University, Foshan 528200, China
| | - Qingfeng Du
- General Practice Center, Nanhai Hospital, Southern Medical University, Foshan 528200, China
| | - Bo Zhang
- Food Safety and Health Research Center, School of Public Health, Southern Medical University, Guangzhou 510515, China; Department of Environmental and Occupational Health, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China.
| | - Tao Liu
- Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou 511430, China; General Practice Center, Nanhai Hospital, Southern Medical University, Foshan 528200, China.
| |
Collapse
|