1
|
Spoer BR, Chen AS, Lampe TM, Nelson IS, Vierse A, Zazanis NV, Kim B, Thorpe LE, Subramanian SV, Gourevitch MN. Validation of a geospatial aggregation method for congressional districts and other US administrative geographies. SSM Popul Health 2023; 24:101511. [PMID: 37711359 PMCID: PMC10498302 DOI: 10.1016/j.ssmph.2023.101511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 09/01/2023] [Accepted: 09/03/2023] [Indexed: 09/16/2023] Open
Abstract
Stakeholders need data on health and drivers of health parsed to the boundaries of essential policy-relevant geographies. US Congressional Districts are an example of a policy-relevant geography which generally lack health data. One strategy to generate Congressional District heath data metric estimates is to aggregate estimates from other geographies, for example, from counties or census tracts to Congressional Districts. Doing so requires several methodological decisions. We refine a method to aggregate health metric estimates from one geography to another, using a population weighted approach. The method's accuracy is evaluated by comparing three aggregated metric estimates to metric estimates from the US Census American Community Survey for the same years: Broadband Access, High School Completion, and Unemployment. We then conducted four sensitivity analyses testing: the effect of aggregating counts vs. percentages; impacts of component geography size and data missingness; and extent of population overlap between component and target geographies. Aggregated estimates were very similar to estimates for identical metrics drawn directly from the data source. Sensitivity analyses suggest the following best practices for Congressional district-based metrics: utilizing smaller, more plentiful geographies like census tracts as opposed to larger, less plentiful geographies like counties, despite potential for less stable estimates in smaller geographies; favoring geographies with higher percentage population overlap.
Collapse
Affiliation(s)
- Ben R. Spoer
- New York University Grossman School of Medicine, Department of Population Health, Division of Epidemiology, New York, NY, USA
| | - Alexander S. Chen
- New York University Grossman School of Medicine, Department of Population Health, Division of Epidemiology, New York, NY, USA
| | - Taylor M. Lampe
- New York University Grossman School of Medicine, Department of Population Health, Division of Epidemiology, New York, NY, USA
| | - Isabel S. Nelson
- New York University Grossman School of Medicine, Department of Population Health, Division of Epidemiology, New York, NY, USA
| | - Anne Vierse
- New York University Grossman School of Medicine, Department of Population Health, Division of Epidemiology, New York, NY, USA
| | - Noah V. Zazanis
- New York University Grossman School of Medicine, Department of Population Health, Division of Epidemiology, New York, NY, USA
| | - Byoungjun Kim
- New York University Grossman School of Medicine, Department of Population Health, Division of Epidemiology, New York, NY, USA
| | - Lorna E. Thorpe
- New York University Grossman School of Medicine, Department of Population Health, Division of Epidemiology, New York, NY, USA
| | - Subu V. Subramanian
- Harvard T.H. Chan School of Public Health, Department of Social and Behavioral Sciences, Boston, MA, USA
| | - Marc N. Gourevitch
- New York University Grossman School of Medicine, Department of Population Health, New York, NY, USA
| |
Collapse
|
2
|
Szarka N, Biljecki F. Population estimation beyond counts-Inferring demographic characteristics. PLoS One 2022; 17:e0266484. [PMID: 35381028 PMCID: PMC8982831 DOI: 10.1371/journal.pone.0266484] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 03/21/2022] [Indexed: 11/18/2022] Open
Abstract
Mapping population distribution at a fine spatial scale is essential for urban studies and planning. Numerous studies, mainly supported by geospatial and statistical methods, have focused primarily on predicting population counts. However, estimating their socio-economic characteristics beyond population counts, such as average age, income, and gender ratio, remains unattended. We enhance traditional population estimation by predicting not only the number of residents in an area, but also their demographic characteristics: average age and the proportion of seniors. By implementing and comparing different machine learning techniques (Random Forest, Support Vector Machines, and Linear Regression) in administrative areas in Singapore, we investigate the use of point of interest (POI) and real estate data for this purpose. The developed regression model predicts the average age of residents in a neighbourhood with a mean error of about 1.5 years (the range of average resident age across Singaporean districts spans approx. 14 years). The results reveal that age patterns of residents can be predicted using real estate information rather than with amenities, which is in contrast to estimating population counts. Another contribution of our work in population estimation is the use of previously unexploited POI and real estate datasets for it, such as property transactions, year of construction, and flat types (number of rooms). Advancing the domain of population estimation, this study reveals the prospects of a small set of detailed and strong predictors that might have the potential of estimating other demographic characteristics such as income.
Collapse
Affiliation(s)
- Noée Szarka
- School of GeoSciences, University of Edinburgh, Edinburgh, United Kingdom
- Department of Architecture, National University of Singapore, Singapore, Singapore
| | - Filip Biljecki
- Department of Architecture, National University of Singapore, Singapore, Singapore
- Department of Real Estate, National University of Singapore, Singapore, Singapore
- * E-mail:
| |
Collapse
|
3
|
Modeling Population Density using a New Index Derived from Multi-Sensor Image Data. REMOTE SENSING 2019. [DOI: 10.3390/rs11222620] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The detailed information about the spatial distribution of the population is crucial for analyzing economic growth, environmental change, and natural disaster damage. Using the nighttime light (NTL) imagery for population estimation has been a topic of interest in recent decades. However, the effectiveness of NTL data in population estimation has been impeded by some limitations such as the blooming effect and underestimation in rural regions. To overcome these limitations, we combine the NPP-VIIRS day/night band (DNB) data with normalized difference vegetation index (NDVI) and land surface temperature (LST) data derived from the moderate resolution imaging spectroradiometer (MODIS) onboard the Terra satellite, to create a new vegetation temperature light population index (VTLPI). A statistical model is developed to predict 250m grid-level population density based on the proposed VTLPI and the least square regression approach. After that, a case study is implemented using the data of Sichuan Province, China in 2015, and the results indicates that the VTLPI-estimated population density outperformed the results from other two methods based on nighttime light imagery or human settlement index, and the three publicized population products, LandScan, WorldPop, and GPW. When using the census data as reference, the mean relative error and median absolute relative error on a township level are 0.29 and 0.12, respectively, and the root-mean-square error is 212 persons/km2. The results show that our VTLPI-based model can achieve a better estimation of population density in rural areas and urban suburbs and characterize more spatial variations at 250m grid level both in both urban and rural areas. The resultant population density offers better population exposure data for assessing natural disaster risk and loss as well as other related applications.
Collapse
|
4
|
Uhl JH, Zoraghein H, Leyk S, Balk D, Corbane C, Syrris V, Florczyk AJ. Exposing the urban continuum: Implications and cross-comparison from an interdisciplinary perspective. INTERNATIONAL JOURNAL OF DIGITAL EARTH 2018; 13:22-44. [PMID: 33014125 PMCID: PMC7531615 DOI: 10.1080/17538947.2018.1550120] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Accepted: 11/13/2018] [Indexed: 06/11/2023]
Abstract
There is an increasing availability of geospatial data describing patterns of human settlement and population such as various global remote-sensing based built-up land layers, fine-grained census-based population estimates, and publicly available cadastral and building footprint data. This development constitutes new integrative modelling opportunities to characterize the continuum of urban, peri-urban, and rural settlements and populations. However, little research has been done regarding the agreement between such data products in measuring human presence which is measured by different proxy variables (i.e., presence of built-up structures derived from different remote sensors, census-derived population counts, or cadastral land parcels). In this work, we quantitatively evaluate and cross-compare the ability of such data to model the urban continuum, using a unique, integrated validation database of cadastral and building footprint data, U.S. census data, and three different versions of the Global Human Settlement Layer (GHSL) derived from remotely sensed data. We identify advantages and shortcomings of these data types across different geographic settings in the U.S., which will inform future data users on implications of data accuracy and suitability for a given application, even in data-poor regions of the world.
Collapse
Affiliation(s)
- Johannes H Uhl
- Department of Geography, University of Colorado Boulder, Boulder, Colorado, USA
| | | | - Stefan Leyk
- Department of Geography, University of Colorado Boulder, Boulder, Colorado, USA
| | - Deborah Balk
- Institute for Demographic Research and Baruch College, City University of New York, New York City, New York, USA
| | - Christina Corbane
- Directorate for Space, Security & Migration, European Commission, Joint Research Centre (JRC), Ispra, Italy
| | - Vasileios Syrris
- Directorate for Space, Security & Migration, European Commission, Joint Research Centre (JRC), Ispra, Italy
| | - Aneta J Florczyk
- Directorate for Space, Security & Migration, European Commission, Joint Research Centre (JRC), Ispra, Italy
| |
Collapse
|
5
|
Zoraghein H, Leyk S. Data-enriched Interpolation for Temporally Consistent Population Compositions. GISCIENCE & REMOTE SENSING 2018; 56:430-461. [PMID: 31889937 PMCID: PMC6936759 DOI: 10.1080/15481603.2018.1509463] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 08/02/2018] [Indexed: 05/29/2023]
Abstract
This research evaluates the performance of areal interpolation coupled with dasymetric refinement to estimate different demographic attributes, namely population sub-groups based on race, age structure and urban residence, within consistent census tract boundaries from 1990 to 2010 in Massachusetts. The creation of such consistent estimates facilitates the study of the nuanced micro-scale evolution of different aspects of population, which is impossible using temporally incompatible small-area census geographies from different points in time. Various unexplored ancillary variables, including the Global Human Settlement Layer (GHSL), the National Land-Cover Database (NLCD), parcels, building footprints and the proprietary ZTRAX® dataset are utilized for dasymetric refinement prior to areal interpolation to examine their effectiveness in improving the accuracy of multi-temporal population estimates. Different areal interpolation methods including Areal Weighting (AW), Target Density Weighting (TDW), Expectation Maximization (EM) and its data-extended approach are coupled with different dasymetric refinement scenarios based on these ancillary variables. The resulting consistent small area estimates of white and black subpopulations, people of age 18-65 and urban population show that dasymetrically refined areal interpolation is particularly effective when the analysis spans a longer time period (1990-2010 instead of 2000-2010) and the enumerated population is sufficiently large (e.g., counts of white vs. black). The results also demonstrate that current census-defined urban areas overestimate the spatial distribution of urban population and dasymetrically refined areal interpolation improves estimates of urban population. Refined TDW using building footprints or the ZTRAX® dataset outperforms all other methods. The implementation of areal interpolation enriched by dasymetric refinement represents a promising strategy to create more reliable multi-temporal and consistent estimates of different population subgroups and thus demographic compositions. This methodological foundation has the potential to advance micro-scale modeling of various subpopulations, particularly urban population to inform studies of urbanization and population change over time as well as future population projections.
Collapse
Affiliation(s)
- Hamidreza Zoraghein
- Corresponding author. , Guggenheim 110, 260 UCB, Boulder, Colorado 80309, USA
| | | |
Collapse
|