1
|
Lu T, Kim SY, Marshall JD. High-Resolution Geospatial Database: National Criteria-Air-Pollutant Concentrations in the Contiguous U.S., 2016-2020. GEOSCIENCE DATA JOURNAL 2025; 12:e70005. [PMID: 40256251 PMCID: PMC12007897 DOI: 10.1002/gdj3.70005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 03/21/2025] [Indexed: 04/22/2025]
Abstract
Concentration estimates for ambient air pollution are used widely in fields such as environmental epidemiology, health impact assessment, urban planning, environmental equity and sustainability. This study builds on previous efforts by developing an updated high-resolution geospatial database of population-weighted annual-average concentrations for six criteria air pollutants (PM2.5, PM10, CO, NO2, SO2, O3) across the contiguous U.S. during a five-year period (2016-2020). We developed Land Use Regression (LUR) models within a partial-least-squares-universal kriging framework by incorporating several land use, geospatial and satellite-based predictor variables. The LUR models were validated using conventional and clustered cross-validation, with the former consistently showing superior performance in capturing the variability of air quality. Most models demonstrated reliable performance (e.g., mean squared error-based R 2 > 0.8, standardised root mean squared error < 0.1). We used the best modelling approach to develop estimates by Census Block, which were then population-weighted averaged at Census Block Group, Census Tract and County geographies. Our database provides valuable insights into the dynamics of air pollution, with utility for environmental risk assessment, public health, policy and urban planning.
Collapse
Affiliation(s)
- Tianjun Lu
- Department of Epidemiology and Environmental Health, College of Public Health, University of Kentucky, Lexington, Kentucky, USA
| | - Sun-Young Kim
- Department of Cancer AI and Digital Health, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang-si, Gyeonggi-do, Korea
| | - Julian D. Marshall
- Department of Civil and Environmental Engineering, University of Washington, Seattle, Washington, USA
| |
Collapse
|
2
|
Martenies SE, Oloo A, Magzamen S, Ji N, Khalili R, Kaur S, Xu Y, Yang T, Bastain TM, Breton CV, Farzan SF, Habre R, Dabelea D. Independent and joint effects of neighborhood-level environmental and socioeconomic exposures on body mass index in early childhood: The environmental influences on child health outcomes (ECHO) cohort. ENVIRONMENTAL RESEARCH 2024; 253:119109. [PMID: 38751004 DOI: 10.1016/j.envres.2024.119109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/19/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024]
Abstract
Past studies support the hypothesis that the prenatal period influences childhood growth. However, few studies explore the joint effects of exposures that occur simultaneously during pregnancy. To explore the feasibility of using mixtures methods with neighborhood-level environmental exposures, we assessed the effects of multiple prenatal exposures on body mass index (BMI) from birth to age 24 months. We used data from two cohorts: Healthy Start (n = 977) and Maternal and Developmental Risks from Environmental and Social Stressors (MADRES; n = 303). BMI was measured at delivery and 6, 12, and 24 months and standardized as z-scores. We included variables for air pollutants, built and natural environments, food access, and neighborhood socioeconomic status (SES). We used two complementary statistical approaches: single-exposure linear regression and quantile-based g-computation. Models were fit separately for each cohort and time point and were adjusted for relevant covariates. Single-exposure models identified negative associations between NO2 and distance to parks and positive associations between low neighborhood SES and BMI z-scores for Healthy Start participants; for MADRES participants, we observed negative associations between O3 and distance to parks and BMI z-scores. G-computations models produced comparable results for each cohort: higher exposures were generally associated with lower BMI, although results were not significant. Results from the g-computation models, which do not require a priori knowledge of the direction of associations, indicated that the direction of associations between mixture components and BMI varied by cohort and time point. Our study highlights challenges in assessing mixtures effects at the neighborhood level and in harmonizing exposure data across cohorts. For example, geospatial data of neighborhood-level exposures may not fully capture the qualities that might influence health behavior. Studies aiming to harmonize geospatial data from different geographical regions should consider contextual factors when operationalizing exposure variables.
Collapse
Affiliation(s)
- Sheena E Martenies
- Kinesiology and Community Health, University of Illinois Urbana-Champaign, Urbana, IL, USA; Division of Nutritional Sciences, University of Illinois Urbana-Champaign, Urbana, IL, USA; Family Resiliency Center, University of Illinois Urbana-Champaign, Urbana, IL, USA.
| | - Alice Oloo
- Kinesiology and Community Health, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - Sheryl Magzamen
- Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO, USA; Epidemiology, Colorado School of Public Health, Aurora, CO, USA
| | - Nan Ji
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Roxana Khalili
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Simrandeep Kaur
- Kinesiology and Community Health, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - Yan Xu
- Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
| | - Tingyu Yang
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Theresa M Bastain
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Carrie V Breton
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Shohreh F Farzan
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Rima Habre
- Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
| | - Dana Dabelea
- Epidemiology, Colorado School of Public Health, Aurora, CO, USA; Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Lifecourse Epidemiology of Adiposity and Diabetes (LEAD) Center, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
3
|
Pietrodangelo A, Bove MC, Forello AC, Crova F, Bigi A, Brattich E, Riccio A, Becagli S, Bertinetti S, Calzolai G, Canepari S, Cappelletti D, Catrambone M, Cesari D, Colombi C, Contini D, Cuccia E, De Gennaro G, Genga A, Ielpo P, Lucarelli F, Malandrino M, Masiol M, Massabò D, Perrino C, Prati P, Siciliano T, Tositti L, Venturini E, Vecchi R. A PM10 chemically characterized nation-wide dataset for Italy. Geographical influence on urban air pollution and source apportionment. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 908:167891. [PMID: 37852492 DOI: 10.1016/j.scitotenv.2023.167891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 10/06/2023] [Accepted: 10/15/2023] [Indexed: 10/20/2023]
Abstract
Urban textures of the Italian cities are peculiarly shaped by the local geography generating similarities among cities placed in different regions but comparable topographical districts. This suggested the following scientific question: can different topographies generate significant differences on the PM10 chemical composition at Italian urban sites that share similar geography despite being in different regions? To investigate whether such communalities can be found and are applicable at Country-scale, we propose here a novel methodological approach. A dataset comprising season-averages of PM10 mass concentration and chemical composition data was built, covering the decade 2005-2016 and referring to urban sites only (21 cities). Statistical analyses, estimation of missing data, identification of latent clusters and source apportionment modeling by Positive Matrix Factorization (PMF) were performed on this unique dataset. The first original result is the demonstration that a dataset with atypical time resolution can be successfully exploited as an input matrix for PMF obtaining Country-scale representative chemical profiles, whose physical consistency has been assessed by different tests of modeling performance. Secondly, this dataset can be considered a reference repository of season averages of chemical species over the Italian territory and the chemical profiles obtained by PMF for urban Italian agglomerations could contribute to emission repositories. These findings indicate that our approach is powerful, and it could be further employed with datasets typically available in the air pollution monitoring networks.
Collapse
Affiliation(s)
- Adriana Pietrodangelo
- C.N.R. Institute of Atmospheric Pollution Research, Monterotondo St., Rome 00015, Italy.
| | - Maria Chiara Bove
- Ligurian Regional Agency for Environmental Protection (ARPAL), Genoa 16149, Italy
| | | | - Federica Crova
- Department of Physics, University of Milan and INFN-Milan, 20133 Milan, Italy
| | - Alessandro Bigi
- Department of Engineering "Enzo Ferrari", University of Modena and Reggio Emilia, Modena 41125, Italy
| | - Erika Brattich
- Department of Physics and Astronomy "Augusto Righi", University of Bologna, Bologna 40126, Italy
| | - Angelo Riccio
- Department of Science and Technology, University of Naples Parthenope, Naples 80143, Italy
| | - Silvia Becagli
- Department of Chemistry "Ugo Schiff", University of Florence, Sesto Fiorentino, Florence 50019, Italy
| | | | - Giulia Calzolai
- National Institute of Nuclear Physics (INFN), Sesto Fiorentino, Florence 50019, Italy
| | - Silvia Canepari
- Department of Environmental Biology, Sapienza University of Rome, 00185 Rome, Italy
| | - David Cappelletti
- Department of Chemistry, Biology and Biotechnology, University of Perugia, 06123 Perugia, Italy
| | | | - Daniela Cesari
- C.N.R. Institute of Atmospheric Sciences and Climate, ISAC-CNR, Lecce 73100, Italy
| | - Cristina Colombi
- Regional Agency for Environmental Protection of Lombardy (ARPA Lombardia), Milan 20124, Italy
| | - Daniele Contini
- C.N.R. Institute of Atmospheric Sciences and Climate, ISAC-CNR, Lecce 73100, Italy
| | - Eleonora Cuccia
- Regional Agency for Environmental Protection of Lombardy (ARPA Lombardia), Milan 20124, Italy
| | | | - Alessandra Genga
- Department of Biological and Environmental Sciences and Technologies DISTeBA, University of Salento, Lecce 73100, Italy
| | - Pierina Ielpo
- C.N.R. Institute of Atmospheric Sciences and Climate, ISAC-CNR, Lecce 73100, Italy
| | - Franco Lucarelli
- Department of Physics and Astrophysics, University of Florence and INFN-Florence, Sesto Fiorentino, Florence, 50019, Italy
| | - Mery Malandrino
- Department of Chemistry, University of Turin, 10125 Turin, Italy
| | - Mauro Masiol
- Department of Environmental Science, Informatics and Statistics, University Ca' Foscari, 30172 Mestre-Venezia, Italy
| | - Dario Massabò
- Department of Physics, University of Genoa and INFN-Genoa, 16146 Genoa, Italy
| | - Cinzia Perrino
- C.N.R. Institute of Atmospheric Pollution Research, Monterotondo St., Rome 00015, Italy
| | - Paolo Prati
- Department of Physics, University of Genoa and INFN-Genoa, 16146 Genoa, Italy
| | - Tiziana Siciliano
- Department of Mathematics and Physics "Ennio De Giorgi", University of Salento, Lecce 73100, Italy
| | - Laura Tositti
- Department of Chemistry "Giacomo Ciamician", University of Bologna, Bologna, 40126, Italy
| | - Elisa Venturini
- Department of Industrial Chemistry "Toso Montanari", University of Bologna, Bologna 40126, Italy
| | - Roberta Vecchi
- Department of Physics, University of Milan and INFN-Milan, 20133 Milan, Italy
| |
Collapse
|
4
|
Wei Y, Qiu X, Yazdi MD, Shtein A, Shi L, Yang J, Peralta AA, Coull BA, Schwartz JD. The Impact of Exposure Measurement Error on the Estimated Concentration-Response Relationship between Long-Term Exposure to PM2.5 and Mortality. ENVIRONMENTAL HEALTH PERSPECTIVES 2022; 130:77006. [PMID: 35904519 PMCID: PMC9337229 DOI: 10.1289/ehp10389] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
BACKGROUND Exposure measurement error is a central concern in air pollution epidemiology. Given that studies have been using ambient air pollution predictions as proxy exposure measures, the potential impact of exposure error on health effect estimates needs to be comprehensively assessed. OBJECTIVES We aimed to generate wide-ranging scenarios to assess direction and magnitude of bias caused by exposure errors under plausible concentration-response relationships between annual exposure to fine particulate matter [PM ≤2.5μm in aerodynamic diameter (PM2.5)] and all-cause mortality. METHODS In this simulation study, we use daily PM2.5 predictions at 1-km2 spatial resolution to estimate annual PM2.5 exposures and their uncertainties for ZIP Codes of residence across the contiguous United States between 2000 and 2016. We consider scenarios in which we vary the error type (classical or Berkson) and the true concentration-response relationship between PM2.5 exposure and mortality (linear, quadratic, or soft-threshold-i.e., a smooth approximation to the hard-threshold model). In each scenario, we generate numbers of deaths using error-free exposures and confounders of concurrent air pollutants and neighborhood-level covariates and perform epidemiological analyses using error-prone exposures under correct specification or misspecification of the concentration-response relationship between PM2.5 exposure and mortality, adjusting for the confounders. RESULTS We simulate 1,000 replicates of each of 162 scenarios investigated. In general, both classical and Berkson errors can bias the concentration-response curve toward the null. The biases remain small even when using three times the predicted uncertainty to generate errors and are relatively larger at higher exposure levels. DISCUSSION Our findings suggest that the causal determination for long-term PM2.5 exposure and mortality is unlikely to be undermined when using high-resolution ambient predictions given that the estimated effect is generally smaller than the truth. The small magnitude of bias suggests that epidemiological findings are relatively robust against the exposure error. In practice, the use of ambient predictions with a finer spatial resolution will result in smaller bias. https://doi.org/10.1289/EHP10389.
Collapse
Affiliation(s)
- Yaguang Wei
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Xinye Qiu
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Mahdieh Danesh Yazdi
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Alexandra Shtein
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Liuhua Shi
- Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
| | - Jiabei Yang
- Department of Biostatistics, School of Public Health, Brown University, Providence, Rhode Island, USA
| | - Adjani A. Peralta
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Brent A. Coull
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Joel D. Schwartz
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
5
|
Ren X, Mi Z, Cai T, Nolte CG, Georgopoulos PG. Flexible Bayesian Ensemble Machine Learning Framework for Predicting Local Ozone Concentrations. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:3871-3883. [PMID: 35312316 PMCID: PMC9133919 DOI: 10.1021/acs.est.1c04076] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
3D-grid-based chemical transport models, such as the Community Multiscale Air Quality (CMAQ) modeling system, have been widely used for predicting concentrations of ambient air pollutants. However, typical horizontal resolutions of nationwide CMAQ simulations (12 × 12 km2) cannot capture local-scale gradients for accurately assessing human exposures and environmental justice disparities. In this study, a Bayesian ensemble machine learning (BEML) framework, which integrates 13 learning algorithms, was developed for downscaling CMAQ estimates of ozone daily maximum 8 h averages to the census tract level, across the contiguous US, and was demonstrated for 2011. Three-stage hyperparameter tuning and targeted validations were designed to ensure the ensemble model's ability to interpolate, extrapolate, and capture concentration peaks. The Shapley value metric from coalitional game theory was applied to interpret the drivers of subgrid gradients. The flexibility (transferability) of the 2011-trained BEML model was further tested by evaluating its ability to estimate fine-scale concentrations for other years (2012-2017) without retraining. To demonstrate the feasibility of using the BEML approach to strictly "data-limited" situations, the model was applied to downscale CMAQ outputs for a future-year scenario-based simulation that considers effects of variations in meteorology associated with climate change.
Collapse
Affiliation(s)
- Xiang Ren
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ 08854, USA
- Department of Chemical and Biochemical Engineering, Rutgers University, Piscataway, NJ 08854, USA
| | - Zhongyuan Mi
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ 08854, USA
- Department of Environmental Sciences, Rutgers University, New Brunswick, NJ 08901, USA
| | - Ting Cai
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ 08854, USA
| | - Christopher G. Nolte
- Center for Environmental Measurement and Modeling, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Panos G. Georgopoulos
- Environmental and Occupational Health Sciences Institute (EOHSI), Rutgers University, Piscataway, NJ 08854, USA
- Department of Chemical and Biochemical Engineering, Rutgers University, Piscataway, NJ 08854, USA
- Department of Environmental Sciences, Rutgers University, New Brunswick, NJ 08901, USA
- Department of Environmental and Occupational Health and Justice, Rutgers School of Public Health, Piscataway, NJ 08854, USA
| |
Collapse
|
6
|
Zhang X, Just AC, Hsu HHL, Kloog I, Woody M, Mi Z, Rush J, Georgopoulos P, Wright RO, Stroustrup A. A hybrid approach to predict daily NO 2 concentrations at city block scale. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 761:143279. [PMID: 33162146 DOI: 10.1016/j.scitotenv.2020.143279] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Revised: 10/12/2020] [Accepted: 10/19/2020] [Indexed: 06/11/2023]
Abstract
Estimating the ambient concentration of nitrogen dioxide (NO2) is challenging because NO2 generated by local fossil fuel combustion varies greatly in concentration across space and time. This study demonstrates an integrated hybrid approach combining dispersion modeling and land use regression (LUR) to predict daily NO2 concentrations at a high spatial resolution (e.g., 50 m) in the New York tri-state area. The daily concentration of traffic-related NO2 was estimated at the Environmental Protection Agency's NO2 monitoring sites in the study area for the years 2015-2017, using the Research LINE source (R-LINE) model with inputs of traffic data provided by the Highway Performance and Management System and meteorological data provided by the NOAA Integrated Surface Database. We used the R-LINE-predicted daily concentrations of NO2 to build mixed-effects regression models, including additional variables representing land use features, geographic characteristics, weather, and other predictors. The mixed model was selected by the Elastic Net method. Each model's performance was evaluated using the out-of-sample coefficient of determination (R2) and the square root of mean squared error (RMSE) from ten-fold cross-validation (CV). The mixed model showed a good prediction performance (CV R2: 0.75-0.79, RMSE: 3.9-4.0 ppb). R-LINE outputs improved the overall, spatial, and temporal CV R2 by 10.0%, 18.9% and 7.7% respectively. Given the output of R-LINE is point-based and has a flexible spatial resolution, this hybrid approach allows prediction of daily NO2 at an extremely high spatial resolution such as city blocks.
Collapse
Affiliation(s)
- Xueying Zhang
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Allan C Just
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Hsiao-Hsien Leon Hsu
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Itai Kloog
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA; The Department of Geography and Environmental Development, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Matthew Woody
- U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Zhongyuan Mi
- Computational Chemodynamics Laboratory, Environmental and Occupational Health Science Institute, Rutgers University, New Brunswick, NJ, USA
| | - Johnathan Rush
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Panos Georgopoulos
- Computational Chemodynamics Laboratory, Environmental and Occupational Health Science Institute, Rutgers University, New Brunswick, NJ, USA
| | - Robert O Wright
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Annemarie Stroustrup
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Neonatology, Department of Pediatrics, Cohen Children's Medical Center at Northwell Health, New Hyde Park, NY, USA
| |
Collapse
|
7
|
Girguis MS, Li L, Lurmann F, Wu J, Breton C, Gilliland F, Stram D, Habre R. Exposure Measurement Error in Air Pollution Studies: The Impact of Shared, Multiplicative Measurement Error on Epidemiological Health Risk Estimates. AIR QUALITY, ATMOSPHERE, & HEALTH 2020; 13:631-643. [PMID: 32601528 PMCID: PMC7323995 DOI: 10.1007/s11869-020-00826-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 04/08/2020] [Indexed: 05/29/2023]
Abstract
Spatiotemporal air pollution models are increasingly being used to estimate health effects in epidemiological studies. Although such exposure prediction models typically result in improved spatial and temporal resolution of air pollution predictions, they remain subject to shared measurement error, a type of measurement error common in spatiotemporal exposure models which occurs when measurement error is not independent of exposures. A fundamental challenge of exposure measurement error in air pollution assessment is the strong correlation and sometimes identical (shared) error of exposure estimates across geographic space and time. When exposure estimates with shared measurement error are used to estimate health risk in epidemiological analyses, complex errors are potentially introduced, resulting in biased epidemiological conclusions. We demonstrate the influence of using a three-stage spatiotemporal exposure prediction model and introduce formal methods of shared, multiplicative measurement error (SMME) correction of epidemiological health risk estimates. Using our three-stage, ensemble learning based nitrogen oxides (NOx) exposure prediction model, we quantified SMME. We conducted an epidemiological analysis of wheeze risk in relation to NOx exposure among school-aged children. To demonstrate the incremental influence of exposure modeling stage, we iteratively estimated the health risk using assigned exposure predictions from each stage of the NOx model. We then determined the impact of SMME on the variance of the health risk estimates under various scenarios. Depending on the stage of the spatiotemporal exposure model used, we found that wheeze odds ratio ranged from 1.16 to 1.28 for an interquartile range increase in NOx. With each additional stage of exposure modeling, the health effect estimate moved further away from the null (OR=1). When corrected for observed SMME, the health effects confidence intervals slightly lengthened, but our epidemiological conclusions were not altered. When the variance estimate was corrected for the potential "worst case scenario" of SMME, the standard error further increased, having a meaningful influence on epidemiological conclusions. Our framework can be expanded and used to understand the implications of using exposure predictions subject to shared measurement error in future health investigations.
Collapse
Affiliation(s)
- Mariam S Girguis
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Lianfa Li
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | | | - Jun Wu
- Program in Public Health, Susan and Henry Samueli College of Health Sciences, University of California, Irvine, CA, USA
| | - Carrie Breton
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Frank Gilliland
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Daniel Stram
- Division of Biostatistics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Rima Habre
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
8
|
Cromar KR, Duncan BN, Bartonova A, Benedict K, Brauer M, Habre R, Hagler GSW, Haynes JA, Khan S, Kilaru V, Liu Y, Pawson S, Peden DB, Quint JK, Rice MB, Sasser EN, Seto E, Stone SL, Thurston GD, Volckens J. Air Pollution Monitoring for Health Research and Patient Care. An Official American Thoracic Society Workshop Report. Ann Am Thorac Soc 2019; 16:1207-1214. [PMID: 31573344 PMCID: PMC6812167 DOI: 10.1513/annalsats.201906-477st] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Air quality data from satellites and low-cost sensor systems, together with output from air quality models, have the potential to augment high-quality, regulatory-grade data in countries with in situ monitoring networks and provide much-needed air quality information in countries without them. Each of these technologies has strengths and limitations that need to be considered when integrating them to develop a robust and diverse global air quality monitoring network. To address these issues, the American Thoracic Society, the U.S. Environmental Protection Agency, the National Aeronautics and Space Administration, and the National Institute of Environmental Health Sciences convened a workshop in May 2017 to bring together global experts from across multiple disciplines and agencies to discuss current and near-term capabilities to monitor global air pollution. The participants focused on four topics: 1) current and near-term capabilities in air pollution monitoring, 2) data assimilation from multiple technology platforms, 3) critical issues for air pollution monitoring in regions without a regulatory-quality stationary monitoring network, and 4) risk communication and health messaging. Recommendations for research and improved use were identified during the workshop, including a recognition that the integration of data across monitoring technology groups is critical to maximizing the effectiveness (e.g., data accuracy, as well as spatial and temporal coverage) of these monitoring technologies. Taken together, these recommendations will advance the development of a global air quality monitoring network that takes advantage of emerging technologies to ensure the availability of free, accessible, and reliable air pollution data and forecasts to health professionals, as well as to all global citizens.
Collapse
|
9
|
Li L, Girguis M, Lurmann F, Wu J, Urman R, Rappaport E, Ritz B, Franklin M, Breton C, Gilliland F, Habre R. Cluster-based bagging of constrained mixed-effects models for high spatiotemporal resolution nitrogen oxides prediction over large regions. ENVIRONMENT INTERNATIONAL 2019; 128:310-323. [PMID: 31078000 PMCID: PMC6538277 DOI: 10.1016/j.envint.2019.04.057] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 04/24/2019] [Accepted: 04/24/2019] [Indexed: 05/29/2023]
Abstract
BACKGROUND Accurate estimation of nitrogen dioxide (NO2) and nitrogen oxide (NOx) concentrations at high spatiotemporal resolutions is crucial for improving evaluation of their health effects, particularly with respect to short-term exposures and acute health outcomes. For estimation over large regions like California, high spatial density field campaign measurements can be combined with more sparse routine monitoring network measurements to capture spatiotemporal variability of NO2 and NOx concentrations. However, monitors in spatially dense field sampling are often highly clustered and their uneven distribution creates a challenge for such combined use. Furthermore, heterogeneities due to seasonal patterns of meteorology and source mixtures between sub-regions (e.g. southern vs. northern California) need to be addressed. OBJECTIVES In this study, we aim to develop highly accurate and adaptive machine learning models to predict high-resolution NO2 and NOx concentrations over large geographic regions using measurements from different sources that contain samples with heterogeneous spatiotemporal distributions and clustering patterns. METHODS We used a comprehensive Kruskal-K-means method to cluster the measurement samples from multiple heterogeneous sources. Spatiotemporal cluster-based bootstrap aggregating (bagging) of the base mixed-effects models was then applied, leveraging the clusters to obtain balanced and less correlated training samples for less bias and improvement in generalization. Further, we used the machine learning technique of grid search to find the optimal interaction of temporal basis functions and the scale of spatial effects, which, together with spatiotemporal covariates, adequately captured spatiotemporal variability in NO2 and NOx at the state and local levels. RESULTS We found an optimal combination of four temporal basis functions and 200 m scale spatial effects for the base mixed-effects models. With the cluster-based bagging of the base models, we obtained robust predictions with an ensemble cross validation R2 of 0.88 for both NO2 and NOx [RMSE (RMSEIQR): 3.62 ppb (0.28) and 9.63 ppb (0.37) respectively]. In independent tests of random sampling, our models achieved similarly strong performance (R2 of 0.87-0.90; RMSE of 3.97-9.69 ppb; RMSEIQR of 0.21-0.27), illustrating minimal over-fitting. CONCLUSIONS Our approach has important implications for fusing data from highly clustered and heterogeneous measurement samples from multiple data sources to produce highly accurate concentration estimates of air pollutants such as NO2 and NOx at high resolution over a large region.
Collapse
Affiliation(s)
- Lianfa Li
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA; State Key Laboratory of Resources and Environmental Information System, Institute of Geographical Sciences and Natural Resources, Chinese Academy of Sciences, Beijing, China.
| | - Mariam Girguis
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | | | - Jun Wu
- Program in Public Health, Susan and Henry Samueli College of Health Sciences, University of California, Irvine, CA, USA
| | - Robert Urman
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Edward Rappaport
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Beate Ritz
- Departments of Epidemiology and Environmental Health, Fileding School of Public Health, University of California, Los Angeles, CA, USA
| | - Meredith Franklin
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Carrie Breton
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Frank Gilliland
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Rima Habre
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
10
|
Bastain TM, Chavez T, Habre R, Girguis MS, Grubbs B, Toledo-Corral C, Amadeus M, Farzan SF, Al-Marayati L, Lerner D, Noya D, Quimby A, Twogood S, Wilson M, Chatzi L, Cousineau M, Berhane K, Eckel SP, Lurmann F, Johnston J, Dunton GF, Gilliland F, Breton C. Study Design, Protocol and Profile of the Maternal And Developmental Risks from Environmental and Social Stressors (MADRES) Pregnancy Cohort: a Prospective Cohort Study in Predominantly Low-Income Hispanic Women in Urban Los Angeles. BMC Pregnancy Childbirth 2019; 19:189. [PMID: 31146718 PMCID: PMC6543670 DOI: 10.1186/s12884-019-2330-7] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 05/03/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND The burden of childhood and adult obesity disproportionally affects Hispanic and African-American populations in the US, and these groups as well as populations with lower income and education levels are disproportionately affected by environmental pollution. Pregnancy is a critical developmental period where maternal exposures may have significant impacts on infant and childhood growth as well as the future health of the mother. We initiated the "Maternal And Developmental Risks from Environmental and Social Stressors (MADRES)" cohort study to address critical gaps in understanding the increased risk for childhood obesity and maternal obesity outcomes among minority and low-income women in urban Los Angeles. METHODS The MADRES cohort is specifically examining whether pre- and postpartum environmental exposures, in addition to exposures to psychosocial and built environment stressors, lead to excessive gestational weight gain and postpartum weight retention in women and to perturbed infant growth trajectories and increased childhood obesity risk through altered psychological, behavioral and/or metabolic responses. The ongoing MADRES study is a prospective pregnancy cohort of 1000 predominantly lower-income, Hispanic women in Los Angeles, CA. Enrollment in the MADRES cohort is initiated prior to 30 weeks gestation from partner community health clinics in Los Angeles. Cohort participants are followed through their pregnancies, at birth, and during the infant's first year of life through a series of in-person visits with interviewer-administered questionnaires, anthropometric measurements and biospecimen collection as well as telephone interviews conducted with the mother. DISCUSSION In this paper, we outline the study rationale and data collection protocol for the MADRES cohort, and we present a profile of demographic, health and exposure characteristics for 291 participants who have delivered their infants, out of 523 participants enrolled in the study from November 2015 to October 2018 from four community health clinics in Los Angeles. Results from the MADRES cohort could provide a powerful rationale for regulation of targeted chemical environmental components, better transportation and urban design policies, and clinical recommendations for stress-coping strategies and behavior to reduce lifelong obesity risk.
Collapse
Affiliation(s)
- Theresa M. Bastain
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| | - Thomas Chavez
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| | - Rima Habre
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| | - Mariam S. Girguis
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| | - Brendan Grubbs
- Department of Obstetrics and Gynecology, University of Southern California, Los Angeles, CA USA
| | - Claudia Toledo-Corral
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
- Department of Public Health, California State University Northridge, Los Angeles, CA USA
| | - Milena Amadeus
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| | - Shohreh F. Farzan
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| | - Laila Al-Marayati
- Department of Obstetrics and Gynecology, University of Southern California, Los Angeles, CA USA
- Eisner Health, Los Angeles, CA USA
| | | | - David Noya
- South Central Family Health Center, Los Angeles, CA USA
| | - Alyssa Quimby
- Department of Obstetrics and Gynecology, University of Southern California, Los Angeles, CA USA
| | - Sara Twogood
- Department of Obstetrics and Gynecology, University of Southern California, Los Angeles, CA USA
| | - Melissa Wilson
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| | - Leda Chatzi
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| | - Michael Cousineau
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| | - Kiros Berhane
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| | - Sandrah P. Eckel
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| | | | - Jill Johnston
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| | - Genevieve F. Dunton
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
- Department of Psychology, University of Southern California, Los Angeles, CA USA
| | - Frank Gilliland
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| | - Carrie Breton
- Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N. Soto Street, M/C 9237, Los Angeles, CA 90032 USA
| |
Collapse
|
11
|
Girguis MS, Li L, Lurmann F, Wu J, Urman R, Rappaport E, Breton C, Gilliland F, Stram D, Habre R. Exposure measurement error in air pollution studies: A framework for assessing shared, multiplicative measurement error in ensemble learning estimates of nitrogen oxides. ENVIRONMENT INTERNATIONAL 2019; 125:97-106. [PMID: 30711654 PMCID: PMC6499078 DOI: 10.1016/j.envint.2018.12.025] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 12/10/2018] [Accepted: 12/12/2018] [Indexed: 05/22/2023]
Abstract
BACKGROUND Increasingly ensemble learning-based spatiotemporal models are being used to estimate residential air pollution exposures in epidemiological studies. While these machine learning models typically have improved performance, they suffer from exposure measurement error that is inherent in all models. Our objective is to develop a framework to formally assess shared, multiplicative measurement error (SMME) in our previously published three-stage, ensemble learning-based nitrogen oxides (NOx) model to identify its spatial and temporal patterns and predictors. METHODS By treating the ensembles as an external dosimetry system, we quantified shared and unshared, multiplicative and additive (SUMA) measurement error components in our exposure model. We used generalized additive models (GAMs) with a smooth term for location to identify geographic locations with significantly elevated SMME and explain their spatial and temporal determinants. RESULTS We found evidence of significant shared and unshared multiplicative error (p < 0.0001) in our ensemble-learning based spatiotemporal NOx model predictions. Unshared multiplicative error was 26 times larger than SMME. We observed significant geographic (p < 0.0001) and temporal variation in SMME with the majority (43%) of predictions with elevated SMME occurring in the earliest time-period (1992-2000). Densely populated urban prediction regions with complex air pollution sources generally exhibited highest odds of elevated SMME. CONCLUSIONS We developed a novel statistical framework to formally evaluate the magnitude and drivers of SMME in ensemble learning-based exposure models. Our framework can be used to inform building future improved exposure models.
Collapse
Affiliation(s)
- Mariam S Girguis
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Lianfa Li
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | | | - Jun Wu
- Department of Public Health, College of Health Sciences, University of California, Irvine, CA, USA
| | - Robert Urman
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Edward Rappaport
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Carrie Breton
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Frank Gilliland
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Daniel Stram
- Division of Biostatistics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Rima Habre
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
12
|
Machine Learning Approaches for Outdoor Air Quality Modelling: A Systematic Review. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8122570] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Current studies show that traditional deterministic models tend to struggle to capture the non-linear relationship between the concentration of air pollutants and their sources of emission and dispersion. To tackle such a limitation, the most promising approach is to use statistical models based on machine learning techniques. Nevertheless, it is puzzling why a certain algorithm is chosen over another for a given task. This systematic review intends to clarify this question by providing the reader with a comprehensive description of the principles underlying these algorithms and how they are applied to enhance prediction accuracy. A rigorous search that conforms to the PRISMA guideline is performed and results in the selection of the 46 most relevant journal papers in the area. Through a factorial analysis method these studies are synthetized and linked to each other. The main findings of this literature review show that: (i) machine learning is mainly applied in Eurasian and North American continents and (ii) estimation problems tend to implement Ensemble Learning and Regressions, whereas forecasting make use of Neural Networks and Support Vector Machines. The next challenges of this approach are to improve the prediction of pollution peaks and contaminants recently put in the spotlights (e.g., nanoparticles).
Collapse
|
13
|
Lin Y, Stripelis D, Chiang YY, Ambite JL, Habre R, Pan F, Eckel SP. Mining Public Datasets for Modeling Intra-City PM 2.5 Concentrations at a Fine Spatial Resolution. PROCEEDINGS OF THE ... ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS : ACM GIS. ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS 2017; 2017:25. [PMID: 29527599 PMCID: PMC5841919 DOI: 10.1145/3139958.3140013] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Air quality models are important for studying the impact of air pollutant on health conditions at a fine spatiotemporal scale. Existing work typically relies on area-specific, expert-selected attributes of pollution emissions (e,g., transportation) and dispersion (e.g., meteorology) for building the model for each combination of study areas, pollutant types, and spatiotemporal scales. In this paper, we present a data mining approach that utilizes publicly available OpenStreetMap (OSM) data to automatically generate an air quality model for the concentrations of fine particulate matter less than 2.5 μm in aerodynamic diameter at various temporal scales. Our experiment shows that our (domain-) expert-free model could generate accurate PM2.5 concentration predictions, which can be used to improve air quality models that traditionally rely on expert-selected input. Our approach also quantifies the impact on air quality from a variety of geographic features (i.e., how various types of geographic features such as parking lots and commercial buildings affect air quality and from what distance) representing mobile, stationary and area natural and anthropogenic air pollution sources. This approach is particularly important for enabling the construction of context-specific spatiotemporal models of air pollution, allowing investigations of the impact of air pollution exposures on sensitive populations such as children with asthma at scale.
Collapse
Affiliation(s)
- Yijun Lin
- Spatial Sciences Institute, University of Southern California
| | | | - Yao-Yi Chiang
- Spatial Sciences Institute, University of Southern California
| | - José Luis Ambite
- Information Sciences Institute, University of Southern California
| | - Rima Habre
- Department of Preventive Medicine, University of Southern California
| | - Fan Pan
- Spatial Sciences Institute, University of Southern California
| | - Sandrah P Eckel
- Department of Preventive Medicine, University of Southern California
| |
Collapse
|