1
|
Wang Z, Wu X, Wu Y. A spatiotemporal XGBoost model for PM 2.5 concentration prediction and its application in Shanghai. Heliyon 2023; 9:e22569. [PMID: 38058450 PMCID: PMC10696222 DOI: 10.1016/j.heliyon.2023.e22569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Revised: 11/13/2023] [Accepted: 11/15/2023] [Indexed: 12/08/2023] Open
Abstract
This paper innovatively constructed an analytical and forecasting framework to predict PM2.5 concentration levels for 16 municipal districts in Shanghai. By means of XGBoost parameters adjustment, empirical mode decomposition, and model fusion, improvements are made on XGBoost prediction accuracy and stability so that prediction deviation at extreme points can be avoided. The main findings of this paper can be summarized as follows: 1) Compared with the original model, the goodness of fit of the modified XGBoost model on the test set increased by 17 %, and the root mean square error decreased by 28 %; 2) The variation of PM2.5 concentration in Shanghai has a significant seasonal (cyclical) effect, and its fluctuation period is 3 months (a quarter). In winter, the frequency of extreme value points is significantly higher than that in other seasons; 3) In terms of spatial distribution, the PM2.5 concentration in the central city of Shanghai is higher than that in the rural areas, and the PM2.5 concentration gradually decreases from center city to the surrounding areas. The innovation and contribution of this paper can be summarized as follows: 1) EEMD algorithm verified by SSA was used to decompose the original model without reconstructing all subsequences and get the best weighing among each part of the hybrid model by using variable weight assignment; 2) The city was cut into pieces according to administrative districts in avoid of the duplicate analysis when utilizing advised Kriging interpolation; 3) IDW method was applied to verified Kriging interpolation to increase the accuracy; 4) The latitude and longitude were innovatively converted into the arc length of the corresponding spherical surface; 5) Hierarchical analysis method was used to obtain the order of importance among the PM2.5 monitoring stations, which could improve the accuracy and achieve dimension reduction.
Collapse
Affiliation(s)
- Zidong Wang
- School of Economics and Management, Shanghai Maritime University, Shanghai 201306, China
| | - Xianhua Wu
- School of Economics and Management, Shanghai Maritime University, Shanghai 201306, China
| | - You Wu
- School of Economics and Management, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
2
|
Sarafian R, Kloog I, Rosenblatt JD. Optimal-design domain-adaptation for exposure prediction in two-stage epidemiological studies. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2023; 33:963-970. [PMID: 35459930 DOI: 10.1038/s41370-022-00438-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 03/28/2022] [Accepted: 03/29/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND In the first stage of a two-stage study, the researcher uses a statistical model to impute the unobserved exposures. In the second stage, imputed exposures serve as covariates in epidemiological models. Imputation error in the first stage operate as measurement errors in the second stage, and thus bias exposure effect estimates. OBJECTIVE This study aims to improve the estimation of exposure effects by sharing information between the first and second stages. METHODS At the heart of our estimator is the observation that not all second-stage observations are equally important to impute. We thus borrow ideas from the optimal-experimental-design theory, to identify individuals of higher importance. We then improve the imputation of these individuals using ideas from the machine-learning literature of domain adaptation. RESULTS Our simulations confirm that the exposure effect estimates are more accurate than the current best practice. An empirical demonstration yields smaller estimates of PM effect on hyperglycemia risk, with tighter confidence bands. SIGNIFICANCE Sharing information between environmental scientist and epidemiologist improves health effect estimates. Our estimator is a principled approach for harnessing this information exchange, and may be applied to any two stage study.
Collapse
Affiliation(s)
- Ron Sarafian
- Department of Industrial Engineering, Ben Gurion University of the Negev, Be'er Sheva, Israel.
| | - Itai Kloog
- Department of Geography and Environmental Development, Ben Gurion University of the Negev, Be'er Sheva, Israel
| | - Jonathan D Rosenblatt
- Department of Industrial Engineering, Ben Gurion University of the Negev, Be'er Sheva, Israel
| |
Collapse
|
3
|
Zhang Y, Wu W, Li Y, Li Y. An investigation of PM2.5 concentration changes in Mid-Eastern China before and after COVID-19 outbreak. ENVIRONMENT INTERNATIONAL 2023; 175:107941. [PMID: 37146469 PMCID: PMC10119641 DOI: 10.1016/j.envint.2023.107941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 03/24/2023] [Accepted: 04/17/2023] [Indexed: 05/07/2023]
Abstract
With the Chinese government revising ambient air quality standards and strengthening the monitoring and management of pollutants such as PM2.5, the concentrations of air pollutants in China have gradually decreased in recent years. Meanwhile, the strong control measures taken by the Chinese government in the face of COVID-19 in 2020 have an extremely profound impact on the reduction of pollutants in China. Therefore, investigations of pollutant concentration changes in China before and after COVID-19 outbreak are very necessary and concerning, but the number of monitoring stations is very limited, making it difficult to conduct a high spatial density investigation. In this study, we construct a modern deep learning model based on multi-source data, which includes remotely sensed AOD data products, other reanalysis element data, and ground monitoring station data. Combining satellite remote sensing techniques, we finally realize a high spital density PM2.5 concentration change investigation method, and analyze the seasonal and annual, the spatial and temporal characteristics of PM2.5 concentrations in Mid-Eastern China from 2016 to 2021 and the impact of epidemic closure and control measures on regional and provincial PM2.5 concentrations. We find that PM2.5 concentrations in Mid-Eastern China during these years is mainly characterized by "north-south superiority and central inferiority", seasonal differences are evident, with the highest in winter, the second highest in autumn and the lowest in summer, and a gradual decrease in overall concentration during the year. According to our experimental results, the annual average PM2.5 concentration decreases by 3.07 % in 2020, and decreases by 24.53 % during the shutdown period, which is probably caused by China's epidemic control measures. At the same time, some provinces with a large share of secondary industry see PM2.5 concentrations drop by more than 30 %. By 2021, PM2.5 concentrations rebound slightly, rising by 10 % in most provinces.
Collapse
Affiliation(s)
- Yongjun Zhang
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China.
| | - Wenpin Wu
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China.
| | - Yiliang Li
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China.
| | - Yansheng Li
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China.
| |
Collapse
|
4
|
Estimation and Analysis of PM 2.5 Concentrations with NPP-VIIRS Nighttime Light Images: A Case Study in the Chang-Zhu-Tan Urban Agglomeration of China. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19074306. [PMID: 35409987 PMCID: PMC8998965 DOI: 10.3390/ijerph19074306] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 03/30/2022] [Accepted: 03/31/2022] [Indexed: 02/04/2023]
Abstract
Rapid economic and social development has caused serious atmospheric environmental problems. The temporal and spatial distribution characteristics of PM2.5 concentrations have become an important research topic for sustainable social development monitoring. Based on NPP-VIIRS nighttime light images, meteorological data, and SRTM DEM data, this article builds a PM2.5 concentration estimation model for the Chang-Zhu-Tan urban agglomeration. First, the partial least squares method is used to calculate the nighttime light radiance, meteorological elements (temperature, relative humidity, and wind speed), and topographic elements (elevation, slope, and topographic undulation) for correlation analysis. Second, we construct seasonal and annual PM2.5 concentration estimation models, including multiple linear regression, support random forest, vector regression, Gaussian process regression, etc., with different factor sets. Finally, the accuracy of the PM2.5 concentration estimation model that results in the Chang-Zhu-Tan urban agglomeration is analyzed, and the spatial distribution of the PM2.5 concentration is inverted. The results show that the PM2.5 concentration correlation of meteorological elements is the strongest, and the topographic elements are the weakest. In terms of seasonal estimation, the spring estimation results of multiple linear regression and machine learning estimation models are the worst, the winter estimation results of multiple linear regression estimation models are the best, and the annual estimation results of machine learning estimation models are the best. At the same time, the study found that there is a significant difference in the temporal and spatial distribution of PM2.5 concentrations. The methods in this article overcome the high cost and spatial resolution limitations of traditional large-scale PM2.5 concentration monitoring, to a certain extent, and can provide a reference for the study of PM2.5 concentration estimation and prediction based on satellite remote sensing technology.
Collapse
|
5
|
An Estimation Method for PM2.5 Based on Aerosol Optical Depth Obtained from Remote Sensing Image Processing and Meteorological Factors. REMOTE SENSING 2022. [DOI: 10.3390/rs14071617] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Understanding the spatiotemporal variations in the mass concentrations of particulate matter ≤2.5 µm (PM2.5) in size is important for controlling environmental pollution. Currently, ground measurement points of PM2.5 in China are relatively discrete, thereby limiting spatial coverage. Aerosol optical depth (AOD) data obtained from satellite remote sensing provide insights into spatiotemporal distributions for regional pollution sources. In this study, data from the Multi-Angle Implementation of Atmospheric Correction (MAIAC) AOD (1 km resolution) product from Moderate Resolution Imaging Spectroradiometer (MODIS) and hourly PM2.5 concentration ground measurements from 2015 to 2020 in Dalian, China were used. Although trends in PM2.5 and AOD were consistent over time, there were seasonal differences. Spatial distributions of AOD and PM2.5 were consistent (R2 = 0.922), with higher PM2.5 values in industrial areas. The method of cross-dividing the test set by year was adopted, with AOD and meteorological factors as the input variable and PM2.5 as the output variable. A backpropagation neural network (BPNN) model of joint cross-validation was established; the stability of the model was evaluated. The trend in the predicted values of BPNN was consistent with the monitored values; the estimation result of the BPNN with the introduction of meteorological factors is better; coefficient of determination (R2) and RMSE standard deviation (SD) between the predicted values and the monitored values in the test set were 0.663–0.752 and 0.01–0.05 μg/m3, respectively. The BPNN was simpler and the training time was shorter compared with those of a regression model and support vector regression (SVR). This study demonstrated that BPNN could be effectively applied to the MAIAC AOD data to estimate PM2.5 concentrations.
Collapse
|
6
|
PM2.5 Modeling and Historical Reconstruction over the Continental USA Utilizing GOES-16 AOD. REMOTE SENSING 2021. [DOI: 10.3390/rs13234788] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
In this study, we present a nationwide machine learning model for hourly PM2.5 estimation for the continental United States (US) using high temporal resolution Geostationary Operational Environmental Satellites (GOES-16) Aerosol Optical Depth (AOD) data, meteorological variables from the European Center for Medium Range Weather Forecasting (ECMWF) and ancillary data collected between May 2017 and December 2020. A model sensitivity analysis was conducted on predictor variables to determine the optimal model. It turns out that GOES16 AOD, variables from ECMWF, and ancillary data are effective variables in PM2.5 estimation and historical reconstruction, which achieves an average mean absolute error (MAE) of 3.0 μg/m3, and a root mean square error (RMSE) of 5.8 μg/m3. This study also found that the model performance as well as the site measured PM2.5 concentrations demonstrate strong spatial and temporal patterns. Specifically, in the temporal scale, the model performed best between 8:00 p.m. and 11:00 p.m. (UTC TIME) and had the highest coefficient of determination (R2) in Autumn and the lowest MAE and RMSE in Spring. In the spatial scale, the analysis results based on ancillary data show that the R2 scores correlate positively with the mean measured PM2.5 concentration at monitoring sites. Mean measured PM2.5 concentrations are positively correlated with population density and negatively correlated with elevation. Water, forests, and wetlands are associated with low PM2.5 concentrations, whereas developed, cultivated crops, shrubs, and grass are associated with high PM2.5 concentrations. In addition, the reconstructed PM2.5 surfaces serve as an important data source for pollution event tracking and PM2.5 analysis. For this purpose, from May 2017 to December 2020, hourly PM2.5 estimates were made for 10 km by 10 km and the PM2.5 estimates from August through November 2020 during the period of California Santa Clara Unite (SCU) Lightning Complex fires are presented. Based on the quantitative and visualization results, this study reveals that a number of large wildfires in California had a profound impact on the value and spatial-temporal distributions of PM2.5 concentrations.
Collapse
|
7
|
Holloway T, Miller D, Anenberg S, Diao M, Duncan B, Fiore AM, Henze DK, Hess J, Kinney PL, Liu Y, Neu JL, O'Neill SM, Odman MT, Pierce RB, Russell AG, Tong D, West JJ, Zondlo MA. Satellite Monitoring for Air Quality and Health. Annu Rev Biomed Data Sci 2021; 4:417-447. [PMID: 34465183 DOI: 10.1146/annurev-biodatasci-110920-093120] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Data from satellite instruments provide estimates of gas and particle levels relevant to human health, even pollutants invisible to the human eye. However, the successful interpretation of satellite data requires an understanding of how satellites relate to other data sources, as well as factors affecting their application to health challenges. Drawing from the expertise and experience of the 2016-2020 NASA HAQAST (Health and Air Quality Applied Sciences Team), we present a review of satellite data for air quality and health applications. We include a discussion of satellite data for epidemiological studies and health impact assessments, as well as the use of satellite data to evaluate air quality trends, support air quality regulation, characterize smoke from wildfires, and quantify emission sources. The primary advantage of satellite data compared to in situ measurements, e.g., from air quality monitoring stations, is their spatial coverage. Satellite data can reveal where pollution levels are highest around the world, how levels have changed over daily to decadal periods, and where pollutants are transported from urban to global scales. To date, air quality and health applications have primarily utilized satellite observations and satellite-derived products relevant to near-surface particulate matter <2.5 μm in diameter (PM2.5) and nitrogen dioxide (NO2). Health and air quality communities have grown increasingly engaged in the use of satellite data, and this trend is expected to continue. From health researchers to air quality managers, and from global applications to community impacts, satellite data are transforming the way air pollution exposure is evaluated.
Collapse
Affiliation(s)
- Tracey Holloway
- Nelson Institute Center for Sustainability and the Global Environment, University of Wisconsin-Madison, Madison, Wisconsin 53726, USA; .,Department of Atmospheric and Oceanic Sciences, University of Wisconsin-Madison, Madison, Wisconsin 53726, USA
| | - Daegan Miller
- Nelson Institute Center for Sustainability and the Global Environment, University of Wisconsin-Madison, Madison, Wisconsin 53726, USA;
| | - Susan Anenberg
- Department of Environmental and Occupational Health, George Washington University, Washington, DC 20052, USA
| | - Minghui Diao
- Department of Meteorology and Climate Science, San José State University, San Jose, California 95192, USA
| | - Bryan Duncan
- Atmospheric Chemistry and Dynamics Laboratory, NASA Goddard Space Flight Center, Greenbelt, Maryland 20771, USA
| | - Arlene M Fiore
- Lamont-Doherty Earth Observatory and Department of Earth and Environmental Sciences, Columbia University, Palisades, New York 10964, USA
| | - Daven K Henze
- Department of Mechanical Engineering, University of Colorado, Boulder, Colorado 80309, USA
| | - Jeremy Hess
- Department of Environmental and Occupational Health Sciences, Department of Global Health, and Department of Emergency Medicine, University of Washington, Seattle, Washington 98105, USA
| | - Patrick L Kinney
- School of Public Health, Boston University, Boston, Massachusetts 02215, USA
| | - Yang Liu
- Gangarosa Department of Environment Health, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, USA
| | - Jessica L Neu
- Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California 91109, USA
| | - Susan M O'Neill
- Pacific Northwest Research Station, USDA Forest Service, Seattle, Washington 98103, USA
| | - M Talat Odman
- School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
| | - R Bradley Pierce
- Department of Atmospheric and Oceanic Sciences, University of Wisconsin-Madison, Madison, Wisconsin 53726, USA.,Space Science and Engineering Center, University of Wisconsin-Madison, Madison, Wisconsin 53726, USA
| | - Armistead G Russell
- School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
| | - Daniel Tong
- Atmospheric, Oceanic and Earth Sciences Department, George Mason University, Fairfax, Virginia 22030, USA
| | - J Jason West
- Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Mark A Zondlo
- Department of Civil and Environmental Engineering, Princeton University, Princeton, New Jersey 08544, USA
| |
Collapse
|
8
|
Michael Y, Helman D, Glickman O, Gabay D, Brenner S, Lensky IM. Forecasting fire risk with machine learning and dynamic information derived from satellite vegetation index time-series. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 764:142844. [PMID: 33158519 DOI: 10.1016/j.scitotenv.2020.142844] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 09/30/2020] [Accepted: 10/01/2020] [Indexed: 05/21/2023]
Abstract
Fire risk mapping - mapping the probability of fire occurrence and spread - is essential for pre-fire management as well as for efficient firefighting efforts. Most fire risk maps are generated using static information on variables such as topography, vegetation density, and fuel instantaneous wetness. Satellites are often used to provide such information. However, long-term vegetation dynamics and the cumulative dryness status of the woody vegetation, which may affect fire occurrence and spread, are rarely considered in fire risk mapping. Here, we investigate the impact of two satellite-derived metrics that represent long-term vegetation status and dynamics on fire risk mapping - the long-term mean normalized difference vegetation index (NDVI) of the woody vegetation (NDVIW) and its trend (NDVIT). NDVIW represents the mean woody density at the grid cell, while NDVIT is the 5-year trend of the woody NDVI representing the long-term dryness status of the vegetation. To produce these metrics, we decompose time-series of satellite-derived NDVI following a method adjusted for Mediterranean woodlands and forests. We tested whether these metrics improve fire risk mapping using three machine learning (ML) algorithms (Logistic Regression, Random Forest, and XGBoost). We chose the 2007 wildfires in Greece for the analysis. Our results indicate that XGBoost, which accounts for variable interactions and non-linear effects, was the ML model that produced the best results. NDVIW improved the model performance, while NDVIT was significant only when NDVIW was high. This NDVIW-NDVIT interaction means that the long-term dryness effect is meaningful only in places of dense woody vegetation. The proposed method can produce more accurate fire risk maps than conventional methods and can supply important dynamic information that may be used in fire behavior models.
Collapse
Affiliation(s)
- Yaron Michael
- Department of Geography and Environment, Bar-Ilan University, Israel.
| | - David Helman
- Department of Soil and Water Sciences, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, P.O.B. 12, Rehovot 7610001, Israel; The Advanced School for Environmental Studies, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Oren Glickman
- The Data Science Institute, Bar-Ilan University, Israel
| | - David Gabay
- The Data Science Institute, Bar-Ilan University, Israel
| | - Steve Brenner
- Department of Geography and Environment, Bar-Ilan University, Israel
| | - Itamar M Lensky
- Department of Geography and Environment, Bar-Ilan University, Israel
| |
Collapse
|
9
|
Holland O, Shaw J, Stark JS, Wilson KA. Hull fouling marine invasive species pose a very low, but plausible, risk of introduction to East Antarctica in climate change scenarios. DIVERS DISTRIB 2021. [DOI: 10.1111/ddi.13246] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Affiliation(s)
- Oakes Holland
- Institute for Future Environments Queensland University of Technology Brisbane Australia
| | - Justine Shaw
- School of Biological Sciences The University of Queensland St. Lucia QLD Australia
- Australian Antarctic Division Kingston TAS Australia
| | | | - Kerrie A. Wilson
- Institute for Future Environments Queensland University of Technology Brisbane Australia
| |
Collapse
|
10
|
Himawari-8 Aerosol Optical Depth (AOD) Retrieval Using a Deep Neural Network Trained Using AERONET Observations. REMOTE SENSING 2020. [DOI: 10.3390/rs12244125] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Spectral aerosol optical depth (AOD) estimation from satellite-measured top of atmosphere (TOA) reflectances is challenging because of the complicated TOA-AOD relationship and a nexus of land surface and atmospheric state variations. This task is usually undertaken using a physical model to provide a first estimate of the TOA reflectances which are then optimized by comparison with the satellite data. Recently developed deep neural network (DNN) models provide a powerful tool to represent the complicated relationship statistically. This study presents a methodology based on DNN to estimate AOD using Himawari-8 Advanced Himawari Imager (AHI) TOA observations. A year (2017) of AHI TOA observations over the Himawari-8 full disk collocated in space and time with Aerosol Robotic Network (AERONET) AOD data were used to derive a total of 14,154 training and validation samples. The TOA reflectance in all six AHI solar bands, three TOA reflectance ratios derived based on the dark-target assumptions, sun-sensor geometry, and auxiliary data are used as predictors to estimate AOD at 500 nm. The DNN AOD is validated by separating training and validation samples using random k-fold cross-validation and using AERONET site-specific leave-one-station-out validation, and is compared with a random forest regression estimator and Japan Meteorological Agency (JMA) AOD. The DNN AOD shows high accuracy: (1) RMSE = 0.094, R2 = 0.915 for k-fold cross-validation, and (2) RMSE = 0.172, R2 = 0.730 for leave-one-station-out validation. The k-fold cross-validation overestimates the DNN accuracy as the training and validation samples may come from the same AHI pixel location. The leave-one-station-out validation reflects the accuracy for large-area applications where there are no training samples for the pixel location to be estimated. The DNN AOD has better accuracy than the random forest AOD and JMA AOD. In addition, the contribution of the dark-target derived TOA ratio predictors is examined and confirmed, and the sensitivity to the DNN structure is discussed.
Collapse
|
11
|
Estimating PM2.5 Concentrations Using Spatially Local Xgboost Based on Full-Covered SARA AOD at the Urban Scale. REMOTE SENSING 2020. [DOI: 10.3390/rs12203368] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The adverse effects caused by PM2.5 have drawn extensive concern and it is of great significance to identify its spatial distribution. Satellite-derived aerosol optical depth (AOD) has been widely used for PM2.5 estimation. However, the coarse spatial resolution and the gaps caused by data deficiency impede its better application at the urban scale. Additionally, obtaining accurate results in unsampled spatial areas when PM2.5 ground sites are insufficient and distribute sparsely is also a challenging issue for PM2.5 spatial distribution estimation. This paper aimed to develop a model, i.e., spatially local extreme gradient boosting (SL-XGB), combining the powerful fitting ability of machine learning and optimal bandwidths of local models, to better estimate PM2.5 concentration at the urban scale by using Beijing as the study area. This paper adopted simplified high-resolution MODIS aerosol retrieval algorithm (SARA) AOD at 500 m resolution as the major independent variable, hence, ensuring the estimation can be operated at a fine scale. Moreover, the extreme gradient boosting (XGBoost) model was adopted to fill the gaps in SARA AOD, thus improving its availability. Then, based on full-covered SARA AOD and other multisource data, the SL-XGB model, integrating multiple local XGBoost models and particular optimal bandwidths, was trained to estimate PM2.5 concentration. For comparison, SL-XGB and two other models, XGBoost and geographically weighted regression (GWR), were evaluated by 10-fold cross validation (CV). The sample-based CV results reveal that the SL-XGB performed the best as assessed through R2 (0.88), root mean square error (RMSE = 24.08 μg/m3) and mean prediction error (MPE = 16.90 μg/m3). Additionally, SL-XGB also performed the best in the site-based CV with a R2 of 0.86, a RMSE of 26.15 μg/m3 and a MPE of 17.97 μg/m3, which shows its good spatial generalization ability. These results demonstrate that SL-XGB can better simultaneously handle non-linear and spatial heterogeneity issues despite spatially limited data at the urban scale. As far as the PM2.5 concentration distribution was concerned, it presented a gradient increase in PM2.5 concentrations from the northwest to the southeast in Beijing, with abundant spatial details. Overall, the proposed approach for PM2.5 estimation showed outstanding performance and can support preventive pollution control and mitigation at the urban scale.
Collapse
|
12
|
Just AC, Arfer KB, Rush J, Dorman M, Shtein A, Lyapustin A, Kloog I. Advancing methodologies for applying machine learning and evaluating spatiotemporal models of fine particulate matter (PM 2.5) using satellite data over large regions. ATMOSPHERIC ENVIRONMENT (OXFORD, ENGLAND : 1994) 2020; 239:117649. [PMID: 33122961 PMCID: PMC7591135 DOI: 10.1016/j.atmosenv.2020.117649] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Reconstructing the distribution of fine particulate matter (PM2.5) in space and time, even far from ground monitoring sites, is an important exposure science contribution to epidemiologic analyses of PM2.5 health impacts. Flexible statistical methods for prediction have demonstrated the integration of satellite observations with other predictors, yet these algorithms are susceptible to overfitting the spatiotemporal structure of the training datasets. We present a new approach for predicting PM2.5 using machine-learning methods and evaluating prediction models for the goal of making predictions where they were not previously available. We apply extreme gradient boosting (XGBoost) modeling to predict daily PM2.5 on a 1×1 km2 resolution for a 13 state region in the Northeastern USA for the years 2000-2015 using satellite-derived aerosol optical depth and implement a recursive feature selection to develop a parsimonious model. We demonstrate excellent predictions of withheld observations but also contrast an RMSE of 3.11 μg/m3 in our spatial cross-validation withholding nearby sites versus an overfit RMSE of 2.10 μg/m3 using a more conventional random ten-fold splitting of the dataset. As the field of exposure science moves forward with the use of advanced machine-learning approaches for spatiotemporal modeling of air pollutants, our results show the importance of addressing data leakage in training, overfitting to spatiotemporal structure, and the impact of the predominance of ground monitoring sites in dense urban sub-networks on model evaluation. The strengths of our resultant modeling approach for exposure in epidemiologic studies of PM2.5 include improved efficiency, parsimony, and interpretability with robust validation while still accommodating complex spatiotemporal relationships.
Collapse
Affiliation(s)
- Allan C Just
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kodi B Arfer
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Johnathan Rush
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Michael Dorman
- The Department of Geography and Environmental Development, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Alex Shtein
- The Department of Geography and Environmental Development, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | | | - Itai Kloog
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Department of Geography and Environmental Development, Ben-Gurion University of the Negev, Beer Sheva, Israel
| |
Collapse
|
13
|
Comparison of Different Missing-Imputation Methods for MAIAC (Multiangle Implementation of Atmospheric Correction) AOD in Estimating Daily PM2.5 Levels. REMOTE SENSING 2020. [DOI: 10.3390/rs12183008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The immense problem of missing satellite aerosol retrievals (Aerosol Optical Depth, (AOD)) detrimentally affects the prediction ability of ground-level PM2.5 concentrations and may lead to unavoidable biases. An appropriate missing-imputation method has not been well developed to date. This study developed a two-stage approach (AOD-imputation stage and PM2.5-prediction stage) to predict short-term PM2.5 exposure in mainland China from 2013–2018. At the AOD-imputation stage, geostatistical methods and machine learning (ML) algorithms were examined to interpolate 1 km satellite aerosol retrievals. At the PM2.5-prediction stage, the daily levels of PM2.5 were predicted at a resolution of 1 km, based on interpolated AOD and meteorological data. The statistical performances of the different interpolation methods were comprehensively compared at each stage. The original coverage of retrieved AOD was 15.46% on average. For the AOD-imputation stage, ML methods produced a higher coverage (98.64%) of AOD than geostatistical methods (21.43–87.31%). Among ML algorithms, random forest (RF) or extreme gradient boosted (XG-interpolated) AOD produced better interpolated quality (CV R2 = 0.89 and 0.85) than other algorithms (0.49–0.78), but XGBoost required only 15% of the computing time of RF. For the PM2.5 predicted stage, neither RF-AOD nor XG-AOD could guarantee higher accuracy in PM2.5 estimations (CV R2 = 0.88 (RF or XG-AOD) compared to 0.85 (original)), or more stable spatial and temporal extrapolation (spatial, (temporal) CV R2 = 0.83 (0.83), 0.82 (0.82), and 0.65 (0.61) for RF, XG, and original). For the AOD-imputation stage, the missing-filled efficiency depended more on external information, while the missing-filled accuracy relied more on model structure. For the PM2.5 predicted stage, efficient AOD interpolation (or the ability to eliminate the missing data) was a precondition for the stable spatial and temporal extrapolation, while the quality of interpolated AOD showed less significant improvements. It was found that XG-AOD is a better choice to estimate daily PM2.5 exposure in health assessments.
Collapse
|
14
|
Just AC, Liu Y, Sorek-Hamer M, Rush J, Dorman M, Chatfield R, Wang Y, Lyapustin A, Kloog I. Gradient boosting machine learning to improve satellite-derived column water vapor measurement error. ATMOSPHERIC MEASUREMENT TECHNIQUES 2020; 13:4669-4681. [PMID: 33193906 PMCID: PMC7665162 DOI: 10.5194/amt-13-4669-2020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The atmospheric products of the Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm include column water vapor (CWV) at a 1 km resolution, derived from daily overpasses of NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) instruments aboard the Aqua and Terra satellites. We have recently shown that machine learning using extreme gradient boosting (XGBoost) can improve the estimation of MAIAC aerosol optical depth (AOD). Although MAIAC CWV is generally well validated (Pearson's R >0.97 versus CWV from AERONET sun photometers), it has not yet been assessed whether machine-learning approaches can further improve CWV. Using a novel spatiotemporal cross-validation approach to avoid overfitting, our XGBoost model, with nine features derived from land use terms, date, and ancillary variables from the MAIAC retrieval, quantifies and can correct a substantial portion of measurement error relative to collocated measurements at AERONET sites (26.9% and 16.5% decrease in root mean square error (RMSE) for Terra and Aqua datasets, respectively) in the Northeastern USA, 2000-2015. We use machine-learning interpretation tools to illustrate complex patterns of measurement error and describe a positive bias in MAIAC Terra CWV worsening in recent summertime conditions. We validate our predictive model on MAIAC CWV estimates at independent stations from the SuomiNet GPS network where our corrections decrease the RMSE by 19.7% and 9.5% for Terra and Aqua MAIAC CWV. Empirically correcting for measurement error with machine-learning algorithms is a postprocessing opportunity to improve satellite-derived CWV data for Earth science and remote sensing applications.
Collapse
Affiliation(s)
- Allan C. Just
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Yang Liu
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Meytar Sorek-Hamer
- Universities Space Research Association (USRA), Mountain View, California, USA
- NASA Ames Research Center, Mountain View, California, USA
| | - Johnathan Rush
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Michael Dorman
- Department of Geography and Environmental Development, Ben-Gurion University of the Negev, Beersheba, Israel
| | | | - Yujie Wang
- Joint Center for Earth Systems Technology, University of Maryland, Baltimore County, Baltimore, Maryland, USA
- NASA Goddard Space Flight Center, Greenbelt, Maryland, USA
| | | | - Itai Kloog
- Department of Geography and Environmental Development, Ben-Gurion University of the Negev, Beersheba, Israel
| |
Collapse
|
15
|
Extreme Gradient Boosting Model for Rain Retrieval using Radar Reflectivity from Various Elevation Angles. REMOTE SENSING 2020. [DOI: 10.3390/rs12142203] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
The purpose of this study was to develop an optimal estimation model for rainfall rate retrievals using radar reflectivity, thereby gaining an effective grasp of rainfall information for disaster prevention uses. A process was designed for evaluating the optimal retrieval models using various dataset combinations with radar reflectivity and ground meteorological attributes. Various ground meteorological attributes (such as relative humidity, wind speed, precipitation, etc.) were obtained using the land-based weather stations affiliated with Taiwan’s Central Weather Bureau (CWB). This study used nine radar reflectivity provided by the Hualien weather surveillance radar station’s Volume Cover Pattern 21 system. The developed models are built using multiple machine learning algorithms, including linear regression (REG), support vector regression (SVR), and extreme gradient boosting (XGBoost), in addition to the Marshall–Palmer formula (MP). The study examined 14 typhoons that occurred from 2008 to 2017 at Chenggong station in southeast Taiwan, and Lanyu station in the outlying islands, and the top four major rainfall events were designated as test typhoons—Nanmadol (2011), Tembin (2012), Matmo (2014), and Nepartak (2016). The results indicated that for rainfall retrievals, radar reflectivity at a scanning (elevation) angle of 6.0° combined with ground meteorological attributes were the optimal input variables for the Chenggong station, whereas radar reflectivity at an elevation angle of 4.3° combined with ground meteorological attributes were optimal for the Lanyu station. In terms of model performance, XGBoost models had the lowest error index at Chenggong and Lanyu stations compared with MP, REG, and SVR models. XGBoost models at Lanyu station had the highest efficiency coefficient (0.903), and those at Chenggong station had the second highest (0.885). As a result, pairing the combination of optimal radar reflectivity and ground meteorological attributes, as verified by the evaluation process, with a high-efficiency algorithm (XGBoost) can effectively increase the accuracy of rainfall retrieval during typhoons.
Collapse
|
16
|
Abdelkareem M, Bamousa AO, Hamimi Z, Kamal El-Din GM. Multispectral and RADAR images integration for geologic, geomorphic, and structural investigation in southwestern Arabian Shield, Al Qunfudhah area, Saudi Arabia. JOURNAL OF TAIBAH UNIVERSITY FOR SCIENCE 2020. [DOI: 10.1080/16583655.2020.1741957] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
| | | | | | - Gamal M. Kamal El-Din
- Geology Department, South Valley University, Qena, Egypt
- Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia
| |
Collapse
|
17
|
Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods. REMOTE SENSING 2020. [DOI: 10.3390/rs12060914] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Estimating air pollution exposure has long been a challenge for environmental health researchers. Technological advances and novel machine learning methods have allowed us to increase the geographic range and accuracy of exposure models, making them a valuable tool in conducting health studies and identifying hotspots of pollution. Here, we have created a prediction model for daily PM2.5 levels in the Greater London area from 1st January 2005 to 31st December 2013 using an ensemble machine learning approach incorporating satellite aerosol optical depth (AOD), land use, and meteorological data. The predictions were made on a 1 km × 1 km scale over 3960 grid cells. The ensemble included predictions from three different machine learners: a random forest (RF), a gradient boosting machine (GBM), and a k-nearest neighbor (KNN) approach. Our ensemble model performed very well, with a ten-fold cross-validated R2 of 0.828. Of the three machine learners, the random forest outperformed the GBM and KNN. Our model was particularly adept at predicting day-to-day changes in PM2.5 levels with an out-of-sample temporal R2 of 0.882. However, its ability to predict spatial variability was weaker, with a R2 of 0.396. We believe this to be due to the smaller spatial variation in pollutant levels in this area.
Collapse
|
18
|
Paranjpe I, Chaudhary K, Paranjpe M, O'Hagan R, Manna S, Jaladanki S, Kapoor A, Horowitz C, DeFelice N, Cooper R, Glicksberg B, Bottinger EP, Just AC, Nadkarni GN. Association of APOL1 Risk Genotype and Air Pollution for Kidney Disease. Clin J Am Soc Nephrol 2020; 15:401-403. [PMID: 32079610 PMCID: PMC7057301 DOI: 10.2215/cjn.11921019] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Ishan Paranjpe
- The Charles Bronfman Institute for Personalized Medicine.,The Hasso Plattner Institute of Digital Health at Mount Sinai
| | | | - Manish Paranjpe
- Harvard-MIT, Division of Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts; and
| | - Ross O'Hagan
- The Charles Bronfman Institute for Personalized Medicine
| | - Sayan Manna
- The Charles Bronfman Institute for Personalized Medicine
| | | | - Arjun Kapoor
- The Charles Bronfman Institute for Personalized Medicine
| | | | - Nicholas DeFelice
- Department of Public Health Sciences, Loyola University School of Medicine, Chicago, Illinois
| | - Richard Cooper
- Department of Public Health Sciences, Loyola University School of Medicine, Chicago, Illinois
| | | | - Erwin P Bottinger
- The Hasso Plattner Institute of Digital Health at Mount Sinai.,Division of Nephrology and Hypertension, Department of Medicine, and
| | - Allan C Just
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Girish N Nadkarni
- The Charles Bronfman Institute for Personalized Medicine, .,The Hasso Plattner Institute of Digital Health at Mount Sinai.,Division of Nephrology and Hypertension, Department of Medicine, and
| |
Collapse
|
19
|
A Robust Deep Learning Approach for Spatiotemporal Estimation of Satellite AOD and PM2.5. REMOTE SENSING 2020. [DOI: 10.3390/rs12020264] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Accurate estimation of fine particulate matter with diameter ≤2.5 μm (PM2.5) at a high spatiotemporal resolution is crucial for the evaluation of its health effects. Previous studies face multiple challenges including limited ground measurements and availability of spatiotemporal covariates. Although the multiangle implementation of atmospheric correction (MAIAC) retrieves satellite aerosol optical depth (AOD) at a high spatiotemporal resolution, massive non-random missingness considerably limits its application in PM2.5 estimation. Here, a deep learning approach, i.e., bootstrap aggregating (bagging) of autoencoder-based residual deep networks, was developed to make robust imputation of MAIAC AOD and further estimate PM2.5 at a high spatial (1 km) and temporal (daily) resolution. The base model consisted of autoencoder-based residual networks where residual connections were introduced to improve learning performance. Bagging of residual networks was used to generate ensemble predictions for better accuracy and uncertainty estimates. As a case study, the proposed approach was applied to impute daily satellite AOD and subsequently estimate daily PM2.5 in the Jing-Jin-Ji metropolitan region of China in 2015. The presented approach achieved competitive performance in AOD imputation (mean test R2: 0.96; mean test RMSE: 0.06) and PM2.5 estimation (test R2: 0.90; test RMSE: 22.3 μg/m3). In the additional independent tests using ground AERONET AOD and PM2.5 measurements at the monitoring station of the U.S. Embassy in Beijing, this approach achieved high R2 (0.82–0.97). Compared with the state-of-the-art machine learning method, XGBoost, the proposed approach generated more reasonable spatial variation for predicted PM2.5 surfaces. Publically available covariates used included meteorology, MERRA2 PBLH and AOD, coordinates, and elevation. Other covariates such as cloud fractions or land-use were not used due to unavailability. The results of validation and independent testing demonstrate the usefulness of the proposed approach in exposure assessment of PM2.5 using satellite AOD having massive missing values.
Collapse
|
20
|
Shtein A, Kloog I, Schwartz J, Silibello C, Michelozzi P, Gariazzo C, Viegi G, Forastiere F, Karnieli A, Just AC, Stafoggia M. Estimating Daily PM 2.5 and PM 10 over Italy Using an Ensemble Model. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2020; 54:120-128. [PMID: 31749355 DOI: 10.1021/acs.est.9b04279] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Spatiotemporally resolved particulate matter (PM) estimates are essential for reconstructing long and short-term exposures in epidemiological research. Improved estimates of PM2.5 and PM10 concentrations were produced over Italy for 2013-2015 using satellite remote-sensing data and an ensemble modeling approach. The following modeling stages were used: (1) missing values of the satellite-based aerosol optical depth (AOD) product were imputed using a spatiotemporal land-use random-forest (RF) model incorporating AOD data from atmospheric ensemble models; (2) daily PM estimations were produced using four modeling approaches: linear mixed effects, RF, extreme gradient boosting, and a chemical transport model, the flexible air quality regional model. The filled-in MAIAC AOD together with additional spatial and temporal predictors were used as inputs in the three first models; (3) a geographically weighted generalized additive model (GAM) ensemble model was used to fuse the estimations from the four models by allowing the weights of each model to vary over space and time. The GAM ensemble model outperformed the four separate models, decreasing the cross-validated root mean squared error by 1-42%, depending on the model. The spatiotemporally resolved PM estimations produced by the suggested model can be applied in future epidemiological studies across Italy.
Collapse
Affiliation(s)
- Alexandra Shtein
- Department of Geography and Environmental Development, Ben-Gurion University of the Negev, Beer Sheva 8410501, Israel
| | - Itai Kloog
- Department of Geography and Environmental Development, Ben-Gurion University of the Negev, Beer Sheva 8410501, Israel
| | - Joel Schwartz
- Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston 02115, Massachusetts, United States
| | | | - Paola Michelozzi
- Department of Epidemiology, Lazio Regional Health Service/ASL Roma 1, Rome 00147, Italy
| | - Claudio Gariazzo
- Occupational and Environmental Medicine, Epidemiology and Hygiene Department, Italian Workers' Compensation Authority (INAIL), Monte Porzio Catone (RM) 00078, Italy
| | - Giovanni Viegi
- Institute for Biomedical Research and Innovation, National Research Council, Palermo 90146, Italy
| | - Francesco Forastiere
- Institute for Biomedical Research and Innovation, National Research Council, Palermo 90146, Italy
- Environmental Research Group, King's College, London SE1 9NH, U.K
| | - Arnon Karnieli
- Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Sede Boker Campus 84990, Israel
| | - Allan C Just
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, New York 10029, United States
| | - Massimo Stafoggia
- Department of Epidemiology, Lazio Regional Health Service/ASL Roma 1, Rome 00147, Italy
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm 171 77, Sweden
| |
Collapse
|
21
|
Urban Health Related Air Quality Indicators over the Middle East and North Africa Countries Using Multiple Satellites and AERONET Data. REMOTE SENSING 2019. [DOI: 10.3390/rs11182096] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Air pollution is reported as one of the most severe environmental problems in the Middle East and North Africa (MENA) region. Remotely sensed data from newly available TROPOMI - TROPOspheric Monitoring Instrument on board Sentinel-5 Precursor, shows an annual mean of high-resolution maps of selected air quality indicators (NO2, CO, O3, and UVAI) of the MENA countries for the first time. The correlation analysis among the aforementioned indicators show the coherency of the air pollutants in urban areas. Multi-year data from the Aerosol Robotic Network (AERONET) stations from nine MENA countries are utilized here to study the aerosol optical depth (AOD) and Ångström exponent (AE) with other available observations. Additionally, a total of 65 different machine learning models of four categories, namely: linear regression, ensemble, decision tree, and deep neural network (DNN), were built from multiple data sources (MODIS, MISR, OMI, and MERRA-2) to predict the best usable AOD product as compared to AERONET data. DNN validates well against AERONET data and proves to be the best model to generate optimized aerosol products when the ground observations are insufficient. This approach can improve the knowledge of air pollutant variability and intensity in the MENA region for decision makers to operate proper mitigation strategies.
Collapse
|
22
|
Retrieval of Total Precipitable Water from Himawari-8 AHI Data: A Comparison of Random Forest, Extreme Gradient Boosting, and Deep Neural Network. REMOTE SENSING 2019. [DOI: 10.3390/rs11151741] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Total precipitable water (TPW), a column of water vapor content in the atmosphere, provides information on the spatial distribution of moisture. The high-resolution TPW, together with atmospheric stability indices such as convective available potential energy (CAPE), is an effective indicator of severe weather phenomena in the pre-convective atmospheric condition. With the advent of high performing imaging instrument onboard geostationary satellites such as Advanced Himawari Imager (AHI) onboard Himawari-8 of Japan and Advanced Meteorological Imager (AMI) onboard GeoKompsat-2A of Korea, it is expected that unprecedented spatiotemporal resolution data (e.g., AMI plans to provide 2 km resolution data at every 2 min over the northeast part of East Asia) will be provided. To derive TPW from such high-resolution data in a timely fashion, an efficient algorithm is highly required. Here, machine learning approaches—random forest (RF), extreme gradient boosting (XGB), and deep neural network (DNN)—are assessed for the TPW retrieved from AHI over the clear sky in Northeast Asia area. For the training dataset, the nine infrared brightness temperatures (BT) of AHI (BT8 to 16 centered at 6.2, 6.9, 7.3, 8.6, 9.6, 10.4, 11.2, 12.4, and 13.3 μ m , respectively), six dual channel differences and observation conditions such as time, latitude, longitude, and satellite zenith angle for two years (September 2016 to August 2018) are used. The corresponding TPW is prepared by integrating the water vapor profiles from InterimEuropean Centre for Medium-Range Weather Forecasts Re-Analysis data (ERA-Interim). The algorithm performances are assessed using the ERA-Interim and radiosonde observations (RAOB) as the reference data. The results show that the DNN model performs better than RF and XGB with a correlation coefficient of 0.96, a mean bias of 0.90 mm, and a root mean square error (RMSE) of 4.65 mm when compared to the ERA-Interim. Similarly, DNN results in a correlation coefficient of 0.95, a mean bias of 1.25 mm, and an RMSE of 5.03 mm when compared to RAOB. Contributing variables to retrieve the TPW in each model and the spatial and temporal analysis of the retrieved TPW are carefully examined and discussed.
Collapse
|
23
|
Assessment of Remote Sensing Data to Model PM10 Estimation in Cities with a Low Number of Air Quality Stations: A Case of Study in Quito, Ecuador. ENVIRONMENTS 2019. [DOI: 10.3390/environments6070085] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The monitoring of air pollutant concentration within cities is crucial for environment management and public health policies in order to promote sustainable cities. In this study, we present an approach to estimate the concentration of particulate matter of less than 10 µm diameter (PM10) using an empirical land use regression (LUR) model and considering different remote sensing data as the input. The study area is Quito, the capital of Ecuador, and the data were collected between 2013 and 2017. The model predictors are the surface reflectance bands (visible and infrared) of Landsat-7 ETM+, Landsat-8 OLI/TIRS, and Aqua-Terra/MODIS sensors and some environmental indexes (normalized difference vegetation index—NDVI; normalized difference soil index—NDSI, soil-adjusted vegetation index—SAVI; normalized difference water index—NDWI; and land surface temperature (LST)). The dependent variable is PM10 ground measurements. Furthermore, this study also aims to compare three different sources of remote sensing data (Landsat-7 ETM+, Landsat-8 OLI, and Aqua-Terra/MODIS) to estimate the PM10 concentration, and three different predictive techniques (stepwise regression, partial least square regression, and artificial neuronal network (ANN)) to build the model. The models obtained are able to estimate PM10 in regions where air data acquisition is limited or even does not exist. The best model is the one built with an ANN, where the coefficient of determination (R2 = 0.68) is the highest and the root-mean-square error (RMSE = 6.22) is the lowest among all the models. Thus, the selected model allows the generation of PM10 concentration maps from public remote sensing data, constituting an alternative over other techniques to estimate pollutants, especially when few air quality ground stations are available.
Collapse
|
24
|
A Full-Coverage Daily Average PM2.5 Retrieval Method with Two-Stage IVW Fused MODIS C6 AOD and Two-Stage GAM Model. REMOTE SENSING 2019. [DOI: 10.3390/rs11131558] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Current PM2.5 retrieval maps have many missing values, which seriously hinders their performance in real applications. This paper presents a framework to map full-coverage daily average PM2.5 concentrations from MODIS C6 aerosol optical depth (AOD) products and fill missing pixels in both the AOD and PM2.5 maps. First, a two-stage inversed variance weights (IVW) algorithm was adopted to fuse the MODIS C6 Terra and Aqua AOD products, which fills missing data in MODIS standard AOD data and obtains a high coverage daily average. After that, using the fused MODIS daily average AOD and ground-level PM2.5 in all grid cells, a two-stage generalized additive model (GAM) was implemented to obtain the full-coverage PM2.5 concentrations. Experiments on the Yangtze River Delta (YRD) in 2013–2016 were carefully designed to validate the performance of our proposed framework. The results show that the two-stage IVW could not only improve the spatial coverage of MODIS AOD against the original standard product by 230%, but could also keep its data accuracy. When compared with the ground-level measurements, the two-stage GAM can obtain accurate PM2.5 concentration estimates (R2 = 0.78, RMSE = 19.177 μg/m3, and RPE = 28.9%). Moreover, our method performs better than the inverse distance weighted method and kriging methods in mapping full-coverage daily PM2.5 concentrations. Therefore, the proposed framework provides a good methodology for retrieving full-coverage daily average PM2.5 concentrations from MODIS standard AOD products.
Collapse
|
25
|
Spatio-temporal Patterns of Land Use/Land Cover Change in the Heterogeneous Coastal Region of Bangladesh between 1990 and 2017. REMOTE SENSING 2019. [DOI: 10.3390/rs11070790] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Although a detailed analysis of land use and land cover (LULC) change is essential in providing a greater understanding of increased human-environment interactions across the coastal region of Bangladesh, substantial challenges still exist for accurately classifying coastal LULC. This is due to the existence of high-level landscape heterogeneity and unavailability of good quality remotely sensed data. This study, the first of a kind, implemented a unique methodological approach to this challenge. Using freely available Landsat imagery, eXtreme Gradient Boosting (XGBoost)-based informative feature selection and Random Forest classification is used to elucidate spatio-temporal patterns of LULC across coastal areas over a 28-year period (1990-2017). We show that the XGBoost feature selection approach effectively addresses the issue of high landscape heterogeneity and spectral complexities in the image data, successfully augmenting the RF model performance (providing a mean user’s accuracy > 0.82). Multi-temporal LULC maps reveal that Bangladesh’s coastal areas experienced a net increase in agricultural land (5.44%), built-up (4.91%) and river (4.52%) areas over the past 28 years. While vegetation cover experienced a net decrease (8.26%), an increasing vegetation trend was observed in the years since 2000, primarily due to the Bangladesh government’s afforestation initiatives across the southern coastal belts. These findings provide a comprehensive picture of coastal LULC patterns, which will be useful for policy makers and resource managers to incorporate into coastal land use and environmental management practices. This work also provides useful methodological insights for future research to effectively address the spatial and spectral complexities of remotely sensed data used in classifying the LULC of a heterogeneous landscape.
Collapse
|
26
|
Machine Learning Approaches for Outdoor Air Quality Modelling: A Systematic Review. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8122570] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Current studies show that traditional deterministic models tend to struggle to capture the non-linear relationship between the concentration of air pollutants and their sources of emission and dispersion. To tackle such a limitation, the most promising approach is to use statistical models based on machine learning techniques. Nevertheless, it is puzzling why a certain algorithm is chosen over another for a given task. This systematic review intends to clarify this question by providing the reader with a comprehensive description of the principles underlying these algorithms and how they are applied to enhance prediction accuracy. A rigorous search that conforms to the PRISMA guideline is performed and results in the selection of the 46 most relevant journal papers in the area. Through a factorial analysis method these studies are synthetized and linked to each other. The main findings of this literature review show that: (i) machine learning is mainly applied in Eurasian and North American continents and (ii) estimation problems tend to implement Ensemble Learning and Regressions, whereas forecasting make use of Neural Networks and Support Vector Machines. The next challenges of this approach are to improve the prediction of pollution peaks and contaminants recently put in the spotlights (e.g., nanoparticles).
Collapse
|
27
|
Ceca LSD, Ferreyra MFG, Lyapustin A, Chudnovsky A, Otero L, Carreras H, Barnaba F. Satellite-based view of the aerosol spatial and temporal variability in the Córdoba region (Argentina) using over ten years of high-resolution data. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING : OFFICIAL PUBLICATION OF THE INTERNATIONAL SOCIETY FOR PHOTOGRAMMETRY AND REMOTE SENSING (ISPRS) 2018; 145:250-267. [PMID: 31105384 PMCID: PMC6516067 DOI: 10.1016/j.isprsjprs.2018.08.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Space-based observations offer a unique opportunity to investigate the atmosphere and its changes over decadal time scales, particularly in regions lacking in situ and/or ground based observations. In this study, we investigate temporal and spatial variability of atmospheric particulate matter (aerosol) over the urban area of Córdoba (central Argentina) using over ten years (2003-2015) of high-resolution (1 km) satellite-based retrievals of aerosol optical depth (AOD). This fine resolution is achieved exploiting the capabilities of a recently developed inversion algorithm (Multiangle implementation of atmospheric correction, MAIAC) applied to the MODIS sensor datasets of the NASA-Terra and -Aqua platforms. Results of this investigation show a clear seasonality of AOD over the investigated area. This is found to be shaped by an intricate superposition of aerosol sources, acting over different spatial scales and affecting the region with different yearly cycles. During late winter and spring (August-October), local as well as near- and long-range transported biomass burning (BB) aerosols enhance the Córdoba aerosol load, and AOD levels reach their maximum values (> 0.35 at 0.47μm). The fine AOD spatial resolution allowed to disclose that, in this period, AOD maxima are found in the rural/agricultural area around the city, reaching up to the city boundaries pinpointing that fires of local and near-range origin play a major role in the AOD enhancement. A reverse spatial AOD gradient is found from December to March, the urban area showing AODs 40 to 80% higher than in the city surroundings. In fact, during summer, the columnar aerosol load over the Córdoba region is dominated by local (urban and industrial) sources, likely coupled to secondary processes driven by enhanced radiation and mixing effects within a deeper planetary boundary layer (PBL). With the support of modelled AOD data from the Modern-Era Retrospective Analysis for Research and Application (MERRA), we further investigated into the chemical nature of AOD. The results suggest that mineral dust is also an important aerosol component in Córdoba, with maximum impact from November to February. The use of a long-term dataset finally allowed a preliminary assessment of AOD trends over the Córdoba region. For those months in which local sources and secondary processes were found to dominate the AOD (December to March), we found a positive AOD trend in the Córdoba outskirts, mainly in the areas with maximum urbanization/population growth over the investigated decade. Conversely, a negative AOD trend (up to -0.1 per decade) is observed all over the rural area of Córdoba during the BB season, this being attributed to a decrease of fires both at the local and the continental scale.
Collapse
Affiliation(s)
- Lara Sofía Della Ceca
- Instituto de Altos Estudios Espaciales ‘Mario Gulich’, Universidad Nacional de Córdoba (UNC)/Comisión Nacional de Actividades Espaciales (CONAE), Ruta Provincial C45 a 8 Km, Falda del Cañete, Córdoba, Argentina
- Instituto de Física Rosario (IFIR), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) and Universidad Nacional de Rosario (UNR), Bv 27 de Febrero 210bis, Rosario, Argentina
| | - María Fernanda García Ferreyra
- Instituto de Altos Estudios Espaciales ‘Mario Gulich’, Universidad Nacional de Córdoba (UNC)/Comisión Nacional de Actividades Espaciales (CONAE), Ruta Provincial C45 a 8 Km, Falda del Cañete, Córdoba, Argentina
- Comisión Nacional de Actividades Espaciales (CONAE), Ruta Provincial C45 a 8 Km, Falda del Cañete, Córdoba, Argentina
| | - Alexei Lyapustin
- NASA Goddard Space Flight Center, code 613, Greenbelt, Maryland 20771 USA
| | - Alexandra Chudnovsky
- Department of Geography and Human Environment, School of Geosciences, Faculty of Exact Sciences, Tel-Aviv University, Israel
| | - Lidia Otero
- Centro de Investigaciones en Láseres y Aplicaciones (CEILAP)-UNIDEF (MINDEF-CONICET) – CITEDEF, Juan Bautista de La Salle 4397 (B1063ALO), Villa Martelli, Buenos Aires, Argentina
- Universidad de la Defensa Nacional, Escuela Superior Técnica Grl Div Manuel N. Savio - Facultad del Ejército, Av. Cabildo 15 (C1426AAA), Ciudad Autónoma de Buenos Aires, Argentina
| | - Hebe Carreras
- Instituto Multidisciplinario de Biología Vegetal (IMBIV), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) and Departamento de Química, FCEFyN, Universidad Nacional de Córdoba, Av.Velez Sarsfield 299, Córdoba, Argentina
| | - Francesca Barnaba
- Istituto di Scienze dell’Atmosfera e del Clima, Consiglio Nazionale delle Ricerche (ISAC-CNR), Via Fosso del Cavaliere, 100 – 00133, Rome, Italy
| |
Collapse
|